<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Data Science Archives - Analytica Data Science Solutions</title>
	<atom:link href="https://analyticadss.com/tag/data-science/feed/" rel="self" type="application/rss+xml" />
	<link>https://analyticadss.com/tag/data-science/</link>
	<description>World&#039;s Leading Artificial Inelegance Company</description>
	<lastBuildDate>Sat, 26 Aug 2023 09:33:24 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.9.4</generator>

<image>
	<url>https://analyticadss.com/wp-content/uploads/2020/06/cropped-F.B-Cover-photo_V0.1-02-32x32.png</url>
	<title>Data Science Archives - Analytica Data Science Solutions</title>
	<link>https://analyticadss.com/tag/data-science/</link>
	<width>32</width>
	<height>32</height>
</image> 
	<item>
		<title>Decoding the Buzz: AI, ML, and Data Science Unveiled</title>
		<link>https://analyticadss.com/decoding-the-buzz-ai-ml-and-data-science-unveiled/</link>
		
		<dc:creator><![CDATA[Aous Abdo]]></dc:creator>
		<pubDate>Mon, 14 Aug 2023 01:02:21 +0000</pubDate>
				<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[Data Science]]></category>
		<category><![CDATA[Machine Learning]]></category>
		<category><![CDATA[Future Technology]]></category>
		<category><![CDATA[Technology]]></category>
		<guid isPermaLink="false">https://analyticadss.com/?p=6205</guid>

					<description><![CDATA[<p>Stepping into the Future: A Comprehensive Dive into AI, ML, and Data Science In this digital era, where technology shapes the contours of our everyday lives, certain terms have gained near-celebrity status. The fields of Artificial Intelligence (AI), Machine Learning (ML), and Data Science aren’t simply trendy jargon — they’re cornerstone technologies steering tomorrow. So, [&#8230;]</p>
<p>The post <a href="https://analyticadss.com/decoding-the-buzz-ai-ml-and-data-science-unveiled/">Decoding the Buzz: AI, ML, and Data Science Unveiled</a> appeared first on <a href="https://analyticadss.com">Analytica Data Science Solutions</a>.</p>
]]></description>
										<content:encoded><![CDATA[		<div data-elementor-type="wp-post" data-elementor-id="6205" class="elementor elementor-6205">
						<section class="elementor-section elementor-top-section elementor-element elementor-element-3c333821 elementor-section-boxed elementor-section-height-default elementor-section-height-default" data-id="3c333821" data-element_type="section" data-e-type="section">
						<div class="elementor-container elementor-column-gap-default">
					<div class="elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-21a434b" data-id="21a434b" data-element_type="column" data-e-type="column">
			<div class="elementor-widget-wrap elementor-element-populated">
						<div class="elementor-element elementor-element-2cd2c296 elementor-widget elementor-widget-text-editor" data-id="2cd2c296" data-element_type="widget" data-e-type="widget" data-widget_type="text-editor.default">
				<div class="elementor-widget-container">
									<p></p>
<h1 id="962e" class="wp-block-heading"><strong>Stepping into the Future: A Comprehensive Dive into AI, ML, and Data Science</strong></h1>
<p></p>
<p></p>
<p id="4f62">In this digital era, where technology shapes the contours of our everyday lives, certain terms have gained near-celebrity status. The fields of Artificial Intelligence (AI), Machine Learning (ML), and Data Science aren’t simply trendy jargon — they’re cornerstone technologies steering tomorrow. So, what lies behind these terms? How do they stand apart, and at which points do they converge?</p>
<p></p>
<p></p>
<p id="3c96">Welcome to our series: “<strong>Decoding the Buzz: AI, ML, and Data Science Unveiled</strong>”.</p>
<p></p>
<p></p>
<p id="e1f9">Over the next few articles, we’ll:</p>
<p></p>
<p></p>
<ul class="wp-block-list">
<li style="list-style-type: none;">
<ul></ul>
</li>
</ul>
<p> </p>
<ul>
<li style="list-style-type: none;">
<ul>
<li>Delve deep into the foundational knowledge behind these technologies.</li>
</ul>
</li>
</ul>
<p></p>
<p></p>
<ul>
<li style="list-style-type: none;">
<ul>
<li>Highlight their real-world applications and impacts.</li>
</ul>
</li>
</ul>
<p></p>
<p></p>
<ul>
<li style="list-style-type: none;">
<ul>
<li>Guide aspiring enthusiasts on learning resources, hands-on projects, and ways to showcase their skills.</li>
</ul>
</li>
</ul>
<p></p>
<p></p>
<p></p>
<p id="edc9">Why now? In our interconnected world powered by data, grasping these subjects is not just for tech aficionados anymore — it’s crucial for various industries and professions. From banking to healthcare, from entertainment to production lines, these technological advancements are pushing the boundaries like never before.</p>
<p></p>
<p></p>
<p id="3107">Whether you’re a student, a professional, or simply someone intrigued by these technologies, this series is tailored for you. The best part? No prior knowledge is required. All you need is curiosity.</p>
<p></p>
<p></p>
<p id="f34a">In today’s kickoff, we’re setting the stage by introducing these influential technologies, their overlaps, and their individual distinctions. Let’s unravel the magic behind the buzz.</p>
<p></p>
<p></p>
<hr class="wp-block-separator has-alpha-channel-opacity" />
<p></p>
<p></p>
<h1 id="8981" class="wp-block-heading">The Rise of AI, ML, and Data Science</h1>
<p></p>
<p></p>
<p id="96db">From the conceptual imaginings of early thinkers to present-day applications that touch nearly every aspect of our daily lives, AI, ML, and Data Science have undergone a remarkable evolution.</p>
<p></p>
<p></p>
<p id="acff"><strong>Artificial Intelligence (AI)</strong>: The notion of devices emulating human cognition has roots in age-old tales, featuring stories of mechanical beings and intelligent contraptions. However, the contemporary understanding of AI started evolving around the mid-1900s. Alan Turing, a pioneer, once pondered, “Can machines think?” His Turing Test became a foundational concept. The 1956 Dartmouth Workshop officially introduced the term “Artificial Intelligence.” Over time, advancements in AI have led to innovations such as Siri and Alexa, which assist us daily, or AI-driven news algorithms that tailor our reading experiences.</p>
<p></p>
<p></p>
<p id="1f4c"><strong>Machine Learning (ML)</strong>: A subset of AI, ML focuses on enabling machines to learn from data. The concept can be traced back to the perceptron in the 1950s, an attempt at mimicking neuron functions. Today, ML impacts us in myriad ways: from YouTube’s video recommendations based on viewing habits to fitness trackers predicting our health trends using historical data.</p>
<p></p>
<p></p>
<p id="078e"><strong>Data Science</strong>: While data analysis has always been pivotal, the sheer volume of digital data produced in recent decades necessitated a new discipline. Data Science combines statistical methods, ML techniques, and domain expertise to glean insights from vast datasets. Every time we shop online and receive personalized shopping suggestions or use navigation apps that predict traffic and suggest optimal routes, we’re experiencing the influence of data science.</p>
<p></p>
<p></p>
<p id="da29"><strong>Real-World Examples</strong>:</p>
<p></p>
<p></p>
<ul class="wp-block-list">
<li style="list-style-type: none;">
<ul></ul>
</li>
</ul>
<p> </p>
<ul>
<li style="list-style-type: none;">
<ul>
<li><strong>Smartphones</strong>: From photo categorization based on facial recognition to predictive text while messaging, AI and ML are deeply integrated into our mobile experiences.</li>
</ul>
</li>
</ul>
<p></p>
<p></p>
<ul>
<li style="list-style-type: none;">
<ul>
<li><strong>Online Shopping</strong>: Platforms like Amazon and eBay use ML to offer product recommendations, adjusting to our preferences with every purchase or search.</li>
</ul>
</li>
</ul>
<p></p>
<p></p>
<ul>
<li style="list-style-type: none;">
<ul>
<li><strong>Entertainment</strong>: Netflix and Spotify employ data science to curate bespoke playlists and movie recommendations. Their algorithms learn from every song we listen to or movie we watch.</li>
</ul>
</li>
</ul>
<p></p>
<p></p>
<ul>
<li style="list-style-type: none;">
<ul>
<li><strong>Home Automation</strong>: Smart thermostats like Nest learn our preferred temperatures throughout the day, adjusting automatically to save energy and enhance comfort.</li>
</ul>
</li>
</ul>
<p></p>
<p></p>
<ul>
<li style="list-style-type: none;">
<ul>
<li><strong>Health & Fitness</strong>: Wearables like Fitbit predict health trends, track sleep patterns, and offer insights — all thanks to data science and ML algorithms.</li>
</ul>
</li>
</ul>
<p></p>
<p></p>
<ul>
<li style="list-style-type: none;">
<ul>
<li><strong>Banking</strong>: From fraud detection algorithms that monitor suspicious activities to chatbots that assist in answering queries, AI has revolutionized the financial sector.</li>
</ul>
</li>
</ul>
<p></p>
<p></p>
<ul>
<li style="list-style-type: none;">
<ul>
<li><strong>Daily Commute</strong>: Apps like Waze and Google Maps analyze real-time traffic data to optimize routes, predict journey durations, and even locate amenities, all harnessing the power of data science.</li>
</ul>
</li>
</ul>
<p></p>
<p></p>
<p></p>
<p id="a1e4">Today, AI, ML, and Data Science are not just confined to tech labs or business sectors; they are integrated into the fabric of our daily existence, enhancing our experiences, making processes efficient, and offering insights that were once unimaginable.</p>
<p></p>
<p></p>
<hr class="wp-block-separator has-alpha-channel-opacity" />
<p></p>
<p></p>
<h1 id="8c20" class="wp-block-heading">AI, ML, and Data Science: More Than Just Buzzwords</h1>
<p></p>
<p></p>
<p id="bac0">In the bustling lanes of technology, the terms AI, ML, and Data Science frequently pop up, sometimes interchangeably. Yet, they each have distinct definitions, scopes, and applications. Let’s demystify these terms and shine a light on their key differences.</p>
<p></p>
<p></p>
<p><strong>Artificial Intelligence (AI):</strong></p>
<p></p>
<p></p>
<p id="3b1a" style="padding-left: 40px;"><strong>Definition</strong>: Essentially, AI encompasses the wide field of engineering machines to carry out actions that, if executed by humans, would necessitate intellect. These functions might vary from basic activities such as identifying trends to more intricate processes like making decisions or solving problems.</p>
<p></p>
<p></p>
<p id="5a94" style="padding-left: 40px;"><strong>Examples</strong>:<br /><strong style="font-family: var( --e-global-typography-text-font-family ), Sans-serif;">Natural Language Processing (NLP)</strong><span style="font-family: var( --e-global-typography-text-font-family ), Sans-serif; font-weight: var( --e-global-typography-text-font-weight ); color: var( --e-global-color-text );">: Virtual assistants like Siri or Alexa understand and respond to your voice commands.</span></p>
<p></p>
<p></p>
<p id="69a8" style="padding-left: 40px;"><strong>Computer Vision</strong>: Snapchat or Instagram filters recognize and adapt to human faces.</p>
<p id="69a8" style="padding-left: 40px;"><strong style="font-family: var( --e-global-typography-text-font-family ), Sans-serif;">Robotics</strong><span style="color: var( --e-global-color-text ); font-family: var( --e-global-typography-text-font-family ), Sans-serif; font-weight: var( --e-global-typography-text-font-weight );">: Robots like Boston Dynamics’ Spot navigating various terrains or performing coordinated dances.</span></p>
<p></p>
<p></p>
<p id="f296" style="padding-left: 40px;"><strong>Note</strong>: <em>AI is the overarching domain under which ML falls. Not all AI needs to learn from data; some AI systems follow predefined algorithms or sets of rules</em>.</p>
<p></p>
<p></p>
<p id="9763"><strong>Machine Learning (ML)</strong>:</p>
<p></p>
<p></p>
<p id="0230" style="padding-left: 40px;"><strong>Definition</strong>: ML is a subset of AI that provides machines the ability to automatically learn and improve from experience without being explicitly programmed for that specific task. It utilizes algorithms to parse data, learn from it, and then apply what it’s learned to make informed decisions.</p>
<p></p>
<p></p>
<p id="55f1" style="padding-left: 40px;"><strong>Examples</strong>:</p>
<p></p>
<p></p>
<p id="badc" style="padding-left: 40px;"><strong>Recommendation Systems:</strong> Netflix suggests movies based on your viewing history.</p>
<p></p>
<p></p>
<p id="c9fb" style="padding-left: 40px;"><strong>Predictive Texting</strong>: Your smartphone’s keyboard predicts the next word as you type.</p>
<p></p>
<p></p>
<p id="82f4" style="padding-left: 40px;"><strong>Fraud Detection</strong>: Credit card companies detect unusual spending patterns to prevent fraudulent activities.</p>
<p></p>
<p></p>
<p id="4e7a" style="padding-left: 40px;"><strong>Note</strong>: While all ML is AI, not all AI is ML. ML specifically centers on systems that can learn from data, while AI encompasses a broader range of intelligent functionalities.</p>
<p></p>
<p></p>
<p id="c317"><strong>Data Science</strong>:</p>
<p></p>
<p></p>
<p id="5819" style="padding-left: 40px;"><strong>Definition</strong>: <em>Data Science is a multidisciplinary field that uses scientific methods, algorithms, processes, and systems to extract knowledge and insights from structured and unstructured data</em>. While it encompasses aspects of ML, its main goal is to derive analytical insights and information from data.</p>
<p></p>
<p></p>
<p id="0b14" style="padding-left: 40px;"><strong>Examples</strong>:</p>
<p></p>
<p></p>
<p id="11ae" style="padding-left: 40px;"><strong>Consumer Behavior Analysis</strong>: E-commerce platforms analyze user clicks, cart additions, and purchases to derive sales insights.</p>
<p></p>
<p></p>
<p id="3793" style="padding-left: 40px;"><strong>Health Analytics</strong>: Hospitals predicting patient admission rates based on past data.</p>
<p></p>
<p></p>
<p id="6d38" style="padding-left: 40px;"><strong>Sports Analytics</strong>: Teams analyze player performances and game strategies using collected data to make informed decisions.</p>
<p></p>
<p></p>
<p id="012f" style="padding-left: 40px;"><strong>Note</strong>: Data Science involves a broader process that includes data collection, cleaning, exploration, and feature engineering, and often uses ML as a tool to predict or classify outcomes from data.</p>
<p></p>
<p></p>
<p id="26bb"><strong>Distinguishing the Three</strong>:</p>
<p></p>
<p></p>
<p id="565f"><strong>Scope</strong>: AI has the broadest scope, encompassing any task performed by a machine that would require intelligence if done by a human. ML is specific to learning from data. Data Science, meanwhile, is centered around the entire process of handling, analyzing, and visualizing data.</p>
<p></p>
<p></p>
<p id="8be4"><strong>Application</strong>: While AI might be about creating an intelligent chatbot, ML would dictate how the chatbot learns from user interactions, and Data Science would analyze the patterns and frequencies of questions, user sentiments, and more.</p>
<p></p>
<p></p>
<p id="9e13"><strong>Tools & Techniques</strong>: AI might leverage rule-based systems, robotics, or computer vision, among others. ML emphasizes algorithms like neural networks, decision trees, or clustering. Data Science frequently utilizes tools and platforms for handling big data, like Hadoop or Spark, along with analytical techniques and visualization tools.</p>
<p></p>
<p></p>
<p id="6cc2">In essence, while these terms are intertwined and often overlap, they each have unique characteristics and roles in the tech landscape. Together, they forge a powerful trio that’s reshaping our world</p>
<p></p>
<p></p>
<hr class="wp-block-separator has-alpha-channel-opacity" />
<p></p>
<p></p>
<h1 id="346d" class="wp-block-heading">The Future Landscape</h1>
<p></p>
<p></p>
<p id="4c7e">As we stand at the crossroads of innovation and technology, it’s exhilarating to ponder the road ahead. The fusion of AI, ML, and Data Science has already catalyzed unprecedented changes. But what does the horizon hold? Dive into a future sculpted by data, algorithms, and human ingenuity as we explore the potential and promise of these transformative fields</p>
<p></p>
<p></p>
<ul class="wp-block-list">
<li style="list-style-type: none;">
<ul></ul>
</li>
</ul>
<p> </p>
<ul>
<li style="list-style-type: none;">
<ul>
<li><strong>Hyper-Personalization</strong>: As data continues to grow exponentially, companies will be able to offer even more tailored experiences. Imagine a world where your smart home knows your mood based on biometric data and plays music or adjusts lighting accordingly.</li>
</ul>
</li>
</ul>
<p></p>
<p></p>
<ul>
<li style="list-style-type: none;">
<ul>
<li><strong>Ethical AI</strong>: As AI systems make more decisions, there will be an increased emphasis on ethical considerations, transparency, and fairness in algorithms. Efforts towards explainable AI, which provides insights into how AI models make decisions, will gain traction.</li>
</ul>
</li>
</ul>
<p></p>
<p></p>
<ul>
<li style="list-style-type: none;">
<ul>
<li><strong>Job Landscape Shift</strong>: While there are concerns about AI and automation leading to job losses, they’ll also create new roles and opportunities. Emphasis will be on roles that involve managing, interpreting, and leveraging AI tools.</li>
</ul>
</li>
</ul>
<p></p>
<p></p>
<ul>
<li style="list-style-type: none;">
<ul>
<li><strong>Healthcare Revolution</strong>: We’re on the cusp of a healthcare transformation. AI might play a role in predicting outbreaks, personalizing medical treatments down to the genetic level, and possibly even in mental health assessment and therapy.</li>
</ul>
</li>
</ul>
<p></p>
<p></p>
<p></p>
<p id="2978">In a world increasingly driven by data, the trio of AI, ML, and Data Science will undoubtedly be at the forefront of the next wave of innovations, driving progress and addressing challenges in ways we’re just beginning to imagine.</p>
<p></p>
<p></p>
<p id="aab3">The domains of AI, ML, and Data Science aren’t mere temporary tech fads; they’re laying the groundwork for our unified digital future. As they develop and intertwine in complex manners, they’re set to transform industries, redefine how users interact, and consistently challenge the limitations of what we think is achievable. Whether you’re a newcomer with a passion, an experienced expert, or just someone interested in watching from the sidelines, these innovations are poised to have a considerable impact on our shared future. As we conclude this section, remember that we’re only at the beginning of our exploration into this expansive territory. Keep asking questions, keep learning, and let’s navigate the possibilities of tomorrow together</p>
<p></p>
<p></p>
<p></p>
<p></p>
<p id="eadf"><strong>References:</strong></p>
<p></p>
<p></p>
<ul class="has-medium-font-size wp-block-list">
<li style="list-style-type: none;">
<ul class="has-medium-font-size"></ul>
</li>
</ul>
<p> </p>
<ul class="has-medium-font-size">
<li style="list-style-type: none;">
<ul class="has-medium-font-size">
<li>Russell, S. J., & Norvig, P. (2010). <em>Artificial Intelligence: A Modern Approach</em>. (on <a href="https://www.amazon.com/Artificial-Intelligence-Modern-Approach-3rd/dp/0136042597" target="_blank" rel="noreferrer noopener">Amazon</a>)</li>
</ul>
</li>
</ul>
<p></p>
<p></p>
<ul class="has-medium-font-size">
<li style="list-style-type: none;">
<ul class="has-medium-font-size">
<li>Goodfellow, I., Bengio, Y., & Courville, A. (2016). <em>Deep Learning</em>. Site on <a href="https://mitpress.mit.edu/9780262035613/deep-learning/" target="_blank" rel="noreferrer noopener">MIT</a></li>
</ul>
</li>
</ul>
<p></p>
<p></p>
<ul class="has-medium-font-size">
<li style="list-style-type: none;">
<ul class="has-medium-font-size">
<li>James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). <em>An Introduction to Statistical Learning</em>. Free Book <a href="https://www.statlearning.com/" target="_blank" rel="noreferrer noopener">Here</a>.</li>
</ul>
</li>
</ul>
<p></p>
<p></p>
<ul class="has-medium-font-size">
<li style="list-style-type: none;">
<ul class="has-medium-font-size">
<li>Dhar, V. (2013). Data science and prediction. <a href="https://dl.acm.org/doi/10.1145/2500499" target="_blank" rel="noreferrer noopener"><em>Communications of the ACM</em>, 56(12), 64–73</a></li>
</ul>
</li>
</ul>
<p></p>
<p></p>
<ul class="has-medium-font-size">
<li style="list-style-type: none;">
<ul class="has-medium-font-size">
<li>Ng, A. (2020). <em>Machine Learning Yearning</em>. deeplearning.ai. <a href="https://info.deeplearning.ai/machine-learning-yearning-book" target="_blank" rel="noreferrer noopener">Book</a></li>
</ul>
</li>
</ul>
<p></p>
<p></p>
<ul class="has-medium-font-size">
<li style="list-style-type: none;">
<ul class="has-medium-font-size">
<li>Brownlee, J. (2016). <em>Machine Learning Mastery</em>. <a href="https://machinelearningmastery.com/" target="_blank" rel="noreferrer noopener">Machine Learning Mastery site</a></li>
</ul>
</li>
</ul>
<p></p>
<p></p>
<ul class="has-medium-font-size">
<li style="list-style-type: none;">
<ul class="has-medium-font-size">
<li>The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World. on <a href="https://www.amazon.com/Master-Algorithm-Ultimate-Learning-Machine-ebook/dp/B012271YB2" target="_blank" rel="noreferrer noopener">Amazon</a></li>
</ul>
</li>
</ul>
<p></p>
<p></p>
<ul class="has-medium-font-size">
<li style="list-style-type: none;">
<ul class="has-medium-font-size">
<li>Superintelligence Paths, Dangers, Strategies. <a href="https://global.oup.com/academic/product/superintelligence-9780199678112?cc=us&lang=en" target="_blank" rel="noreferrer noopener">Book</a></li>
</ul>
</li>
</ul>
<p></p>
<p></p>
<ul class="has-medium-font-size">
<li style="list-style-type: none;">
<ul class="has-medium-font-size">
<li>Sutton, R. S., & Barto, A. G. (2018). <em>Reinforcement Learning: An Introduction</em>. Free book <a href="https://inst.eecs.berkeley.edu/~cs188/sp20/assets/files/SuttonBartoIPRLBook2ndEd.pdf" target="_blank" rel="noreferrer noopener">here</a></li>
</ul>
</li>
</ul>
<p></p>
<p></p>
<ul class="has-medium-font-size">
<li style="list-style-type: none;">
<ul class="has-medium-font-size">
<li>Prediction Machines: The Simple Economics of Artificial Intelligence. On <a href="https://www.amazon.com/Prediction-Machines-Economics-Artificial-Intelligence/dp/1633695670" target="_blank" rel="noreferrer noopener">Amazon</a></li>
</ul>
</li>
</ul>
<p></p>
<p></p>								</div>
				</div>
					</div>
		</div>
					</div>
		</section>
				</div>
		<p>The post <a href="https://analyticadss.com/decoding-the-buzz-ai-ml-and-data-science-unveiled/">Decoding the Buzz: AI, ML, and Data Science Unveiled</a> appeared first on <a href="https://analyticadss.com">Analytica Data Science Solutions</a>.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Unleash the Power of Functional Programming in R with the purrr Package</title>
		<link>https://analyticadss.com/unleash-the-power-of-functional-programming-in-r-with-the-purrr-package/</link>
		
		<dc:creator><![CDATA[Aous Abdo]]></dc:creator>
		<pubDate>Fri, 14 Apr 2023 18:10:01 +0000</pubDate>
				<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[Data Science]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[R Statistical Language]]></category>
		<category><![CDATA[functional programming]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[Rstats]]></category>
		<guid isPermaLink="false">https://analyticadss.com/?p=6138</guid>

					<description><![CDATA[<p>Introduction Welcome to our comprehensive guide on harnessing the power of the purrr package in R for functional programming. If you’re keen on elevating your R skills, you’re in for a treat. Today, we’ll be delving into the wonders of the purrr package — a lifesaver for functional programming. With the avalanche of data we encounter nowadays, having the [&#8230;]</p>
<p>The post <a href="https://analyticadss.com/unleash-the-power-of-functional-programming-in-r-with-the-purrr-package/">Unleash the Power of Functional Programming in R with the purrr Package</a> appeared first on <a href="https://analyticadss.com">Analytica Data Science Solutions</a>.</p>
]]></description>
										<content:encoded><![CDATA[
<h2 class="wp-block-heading has-medium-font-size" id="9372">Introduction</h2>



<p id="dd36">Welcome to our comprehensive guide on harnessing the power of the <code>purrr</code> package in R for functional programming. If you’re keen on elevating your R skills, you’re in for a treat. Today, we’ll be delving into the wonders of the <code>purrr</code> package — a lifesaver for functional programming. With the avalanche of data we encounter nowadays, having the right tools for efficient data wrangling is paramount. If you’ve dabbled in R, you might’ve felt certain built-in functions lacking, especially when grappling with intricate operations.</p>



<p id="a3fa">This is where <code>purrr</code> strides in, offering a plethora of robust tools to fine-tune your code, making it not only clearer but also more sustainable.</p>



<p id="e548">Throughout this article, we’ll journey through the intricacies of the <code>purrr</code> package, elucidate its fundamental functions, and showcase its real-world applicability. We’ll also touch on how it can enrich your experience with R, making it more fruitful. By the time you reach the end, you’ll be well-versed in the magic of <code>purrr</code> and ready to wield its power in your data endeavors. Let’s embark on this insightful voyage into the realm of R and unravel the capabilities of the <code>purrr</code> package.</p>



<hr class="wp-block-separator has-alpha-channel-opacity is-style-dots"/>



<h2 class="wp-block-heading has-medium-font-size" id="e803"><strong>What is functional programming and why is it useful?</strong></h2>



<p id="fd43">Functional programming isn’t merely a way to write code; it’s a philosophical shift that guides how we approach computation. By treating computation as the evaluation of mathematical functions, it foregoes changes to the state and avoids mutable data. Instead, it thrives on pure functions that take given inputs and produce predictable outputs, devoid of side effects. The outcome? Code that’s more modular, predictable, and test-friendly.</p>



<p id="0e06">Now, if you’re working with R, particularly for data manipulation and analysis, functional programming can be a game-changer. It lets you create more coherent and succinct code, and here’s how:</p>



<ol class="wp-block-list">
<li>Enhanced Readability and Sustainability: Decomposing complex procedures into smaller, more digestible functions improves the understandability of your code. Plus, it’s easier to tweak as needed.</li>



<li>Boosted Productivity: By steering clear of traps like global variables, which may lead to unforeseen behaviors and debugging headaches, functional programming saves time and frustration.</li>



<li>Optimized Performance: Embracing functional programming could also enhance the efficiency of your code. It prompts the use of vectorized operations and cuts down on the necessity for explicit loops.</li>
</ol>



<p id="b77b">Eager to tap into these benefits? Read on! We’ll dive into the <code>purrr</code> package, an invaluable asset for adopting functional programming in R. Through its power, you can not only elevate your data manipulation and analysis routines but also bring more enjoyment and effectiveness to your programming journey.</p>



<hr class="wp-block-separator has-alpha-channel-opacity is-style-dots"/>



<h2 class="wp-block-heading has-medium-font-size" id="3fdc">Exploring the <code>purrr </code>package</h2>



<p id="283b">Belonging to the tidyverse collection, the <code>purrr</code>package serves as R’s gateway to functional programming. It’s packed with dynamic functions crafted to ease tasks when working with lists and a variety of data structures. Adopting <code>purrr</code>ensures that your data transformation, summarization, and manipulation processes benefit from a unified and logical syntax.</p>



<p id="382c">Let’s delve into what sets <code>purrr</code>apart:</p>



<ol class="wp-block-list">
<li>Uniformity in Function Naming: One of <code>purrrs’</code> strengths is its organized naming structure, simplifying the task of recalling and employing its functions.</li>



<li>Proficiency with Complex Data Structures: Be it nested lists, data frames, or any layered data structure, <code>purrr</code>stands out in its management capabilities.</li>



<li>Robust Error Management: Real-world data can be messy. <code>purrr</code>lends a hand by equipping you with functions that elegantly tackle errors and unexpected scenarios.</li>



<li>Harmony with <code>tidyverse</code> Companions: A significant advantage is <code>purrr</code>compatibility with renowned <code>tidyverse</code> allies such as <code>dplyr</code>, <code>tidyr</code>, and <code>ggplot2</code>. This cohesion allows for a smoother integration of functional programming into your prevailing data routines.</li>
</ol>



<p id="298a">Keen to commence your <code>purrr</code>journey? Simply fetch it from CRAN and initialize it in your R workspace.</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" data-code-block-pro-font-family="Code-Pro-JetBrains-Mono" style="font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:7.708335876464844px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="install.packages(&quot;purrr&quot;)
library(purrr)" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki monokai" style="background-color: #272822" tabindex="0"><code><span class="line"><span style="color: #66D9EF">install.packages</span><span style="color: #F8F8F2">(</span><span style="color: #E6DB74">&quot;purrr&quot;</span><span style="color: #F8F8F2">)</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(purrr)</span></span></code></pre></div>



<p id="51fd">In the next section, we will dive into the core functions provided by the <code>purrr</code> package and demonstrate their usage with practical examples.</p>



<hr class="wp-block-separator has-alpha-channel-opacity is-style-dots"/>



<h2 class="wp-block-heading has-medium-font-size" id="b5b5"><strong>Core functions in purrr</strong></h2>



<p id="3c30">In this section, we will cover some of the most important and widely used functions in the <code>purrr</code> package, along with examples to demonstrate their usage.</p>



<p id="ac1b"> <strong>A. The map() family</strong></p>



<p id="0f09">The <code>map()</code> family of functions is the heart of the <code>purrr</code> package. These functions allow you to apply a function to each element of a list or a vector and return the results in a specified format.</p>



<ul class="wp-block-list">
<li><code>map()</code>: Returns a list.</li>



<li><code>map_lgl()</code>: Returns a logical vector.</li>



<li><code>map_int()</code>: Returns an integer vector.</li>



<li><code>map_dbl()</code>: Returns a double vector.</li>



<li><code>map_chr()</code>: Returns a character vector.</li>



<li><code>map_df()</code>: Returns a data frame.</li>
</ul>



<p id="e748">Example:</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" data-code-block-pro-font-family="Code-Pro-JetBrains-Mono" style="font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:7.7083282470703125px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="# Define a list of numbers
number_list <- list(1, 2, 3, 4)

# Square each number using map()
squared_numbers <- map(number_list, ~ .x^2)
print(squared_numbers)" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki monokai" style="background-color: #272822" tabindex="0"><code><span class="line"><span style="color: #88846F"># Define a list of numbers</span></span>
<span class="line"><span style="color: #F8F8F2">number_list </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF; font-style: italic">list</span><span style="color: #F8F8F2">(</span><span style="color: #AE81FF">1</span><span style="color: #F8F8F2">, </span><span style="color: #AE81FF">2</span><span style="color: #F8F8F2">, </span><span style="color: #AE81FF">3</span><span style="color: #F8F8F2">, </span><span style="color: #AE81FF">4</span><span style="color: #F8F8F2">)</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Square each number using map()</span></span>
<span class="line"><span style="color: #F8F8F2">squared_numbers </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> map(number_list, </span><span style="color: #F92672">~</span><span style="color: #F8F8F2"> .x</span><span style="color: #F92672">^</span><span style="color: #AE81FF">2</span><span style="color: #F8F8F2">)</span></span>
<span class="line"><span style="color: #66D9EF">print</span><span style="color: #F8F8F2">(squared_numbers)</span></span></code></pre></div>



<p id="89fb"><strong>B. pmap()</strong></p>



<p id="1c29">The <code>pmap()</code> function is used to apply a function to elements of multiple lists simultaneously.</p>



<p id="f317">Example:</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" data-code-block-pro-font-family="Code-Pro-JetBrains-Mono" style="font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:7.7083282470703125px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="# Define two lists
list1 <- list(1, 2, 3)
list2 <- list(4, 5, 6)

# Add corresponding elements of the two lists using pmap()
sum_list <- pmap(list(list1, list2), ~ ..1 + ..2)
print(sum_list)" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki monokai" style="background-color: #272822" tabindex="0"><code><span class="line"><span style="color: #88846F"># Define two lists</span></span>
<span class="line"><span style="color: #F8F8F2">list1 </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF; font-style: italic">list</span><span style="color: #F8F8F2">(</span><span style="color: #AE81FF">1</span><span style="color: #F8F8F2">, </span><span style="color: #AE81FF">2</span><span style="color: #F8F8F2">, </span><span style="color: #AE81FF">3</span><span style="color: #F8F8F2">)</span></span>
<span class="line"><span style="color: #F8F8F2">list2 </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF; font-style: italic">list</span><span style="color: #F8F8F2">(</span><span style="color: #AE81FF">4</span><span style="color: #F8F8F2">, </span><span style="color: #AE81FF">5</span><span style="color: #F8F8F2">, </span><span style="color: #AE81FF">6</span><span style="color: #F8F8F2">)</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Add corresponding elements of the two lists using pmap()</span></span>
<span class="line"><span style="color: #F8F8F2">sum_list </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> pmap(</span><span style="color: #66D9EF; font-style: italic">list</span><span style="color: #F8F8F2">(list1, list2), </span><span style="color: #F92672">~</span><span style="color: #F8F8F2"> ..1 </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> .</span><span style="color: #AE81FF">.2</span><span style="color: #F8F8F2">)</span></span>
<span class="line"><span style="color: #66D9EF">print</span><span style="color: #F8F8F2">(sum_list)</span></span></code></pre></div>



<p id="0ca9"><strong>C. safely(), quietly(), and possibly()</strong></p>



<p id="ee39">These functions are used to handle errors and exceptions gracefully while applying a function.</p>



<ul class="wp-block-list">
<li><code>safely()</code>: Returns a list containing the result and any error encountered.</li>



<li><code>quietly()</code>: Returns a list containing the result, any warnings, and any messages.</li>



<li><code>possibly()</code>: Returns a default value if an error is encountered.</li>
</ul>



<p id="a799">Example:</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" data-code-block-pro-font-family="Code-Pro-JetBrains-Mono" style="font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:7.708335876464844px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="# Define a list with numbers and a character
mixed_list <- list(1, 2, &quot;a&quot;, 3)

# Define a safely wrapped square function
safe_square <- safely(~ .x^2)

# Apply the safe_square function to the mixed_list
results <- map(mixed_list, safe_square)
print(results)" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki monokai" style="background-color: #272822" tabindex="0"><code><span class="line"><span style="color: #88846F"># Define a list with numbers and a character</span></span>
<span class="line"><span style="color: #F8F8F2">mixed_list </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF; font-style: italic">list</span><span style="color: #F8F8F2">(</span><span style="color: #AE81FF">1</span><span style="color: #F8F8F2">, </span><span style="color: #AE81FF">2</span><span style="color: #F8F8F2">, </span><span style="color: #E6DB74">&quot;a&quot;</span><span style="color: #F8F8F2">, </span><span style="color: #AE81FF">3</span><span style="color: #F8F8F2">)</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Define a safely wrapped square function</span></span>
<span class="line"><span style="color: #F8F8F2">safe_square </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> safely(</span><span style="color: #F92672">~</span><span style="color: #F8F8F2"> .x</span><span style="color: #F92672">^</span><span style="color: #AE81FF">2</span><span style="color: #F8F8F2">)</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Apply the safe_square function to the mixed_list</span></span>
<span class="line"><span style="color: #F8F8F2">results </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> map(mixed_list, safe_square)</span></span>
<span class="line"><span style="color: #66D9EF">print</span><span style="color: #F8F8F2">(results)</span></span></code></pre></div>



<p id="cbfb"><strong>D. compact() and compose()</strong></p>



<p id="37ee"><code>compact()</code> is used to remove <code>NULL</code> elements from a list, while <code>compose()</code> allows you to combine multiple functions into a single function.</p>



<p id="a367">Example:</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" data-code-block-pro-font-family="Code-Pro-JetBrains-Mono" style="font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:15.402778625488281px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="# Define a list with NULL elements
null_list <- list(1, NULL, 2, NULL, 3)

# Remove NULL elements using compact()
clean_list <- compact(null_list)
print(clean_list)

# Compose two functions: square and increment
square <- function(x) x^2
increment <- function(x) x + 1
square_and_increment <- compose(increment, square)

# Apply the composed function to a number
result <- square_and_increment(3)
print(result)" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki monokai" style="background-color: #272822" tabindex="0"><code><span class="line"><span style="color: #88846F"># Define a list with NULL elements</span></span>
<span class="line"><span style="color: #F8F8F2">null_list </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF; font-style: italic">list</span><span style="color: #F8F8F2">(</span><span style="color: #AE81FF">1</span><span style="color: #F8F8F2">, </span><span style="color: #AE81FF">NULL</span><span style="color: #F8F8F2">, </span><span style="color: #AE81FF">2</span><span style="color: #F8F8F2">, </span><span style="color: #AE81FF">NULL</span><span style="color: #F8F8F2">, </span><span style="color: #AE81FF">3</span><span style="color: #F8F8F2">)</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Remove NULL elements using compact()</span></span>
<span class="line"><span style="color: #F8F8F2">clean_list </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> compact(null_list)</span></span>
<span class="line"><span style="color: #66D9EF">print</span><span style="color: #F8F8F2">(clean_list)</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Compose two functions: square and increment</span></span>
<span class="line"><span style="color: #A6E22E">square</span><span style="color: #F8F8F2"> </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">function</span><span style="color: #F8F8F2">(x) x</span><span style="color: #F92672">^</span><span style="color: #AE81FF">2</span></span>
<span class="line"><span style="color: #A6E22E">increment</span><span style="color: #F8F8F2"> </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">function</span><span style="color: #F8F8F2">(x) x </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">1</span></span>
<span class="line"><span style="color: #F8F8F2">square_and_increment </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> compose(increment, square)</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Apply the composed function to a number</span></span>
<span class="line"><span style="color: #F8F8F2">result </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> square_and_increment(</span><span style="color: #AE81FF">3</span><span style="color: #F8F8F2">)</span></span>
<span class="line"><span style="color: #66D9EF">print</span><span style="color: #F8F8F2">(result)</span></span></code></pre></div>



<p>These core functions are just the beginning of what <code>purrr</code> has to offer. In the next section, we will demonstrate how to use these functions to solve real-world problems through practical examples.</p>



<hr class="wp-block-separator has-alpha-channel-opacity is-style-dots"/>



<h2 class="wp-block-heading has-medium-font-size" id="4243">Practical examples with purrr</h2>



<p id="1de5">In this section, we will explore two practical examples that demonstrate how the <code>purrr</code> package can be used to solve real-world problems efficiently.</p>



<p id="3fb2"><strong>A. Example 1: Calculating summary statistics for multiple variables</strong></p>



<p id="3260">Suppose you have a data frame with multiple numerical variables, and you want to calculate summary statistics (mean, median, and standard deviation) for each of these variables.</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" data-code-block-pro-font-family="Code-Pro-JetBrains-Mono" style="font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:15.40277099609375px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="# Load required packages
library(dplyr)
library(purrr)

# Create a sample data frame
data <- data.frame(
  var1 = rnorm(100, mean = 10, sd = 2),
  var2 = rnorm(100, mean = 20, sd = 5),
  var3 = rnorm(100, mean = 30, sd = 3),
  stringsAsFactors = FALSE
)

# Define a list of summary functions
summary_functions <- list(mean = mean, median = median, sd = sd)

# Calculate summary statistics for each variable using nested map functions
summary_stats <- map_dfr(summary_functions, ~ map_dfc(data, .x), .id = &quot;Statistic&quot;)
print(summary_stats)" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki monokai" style="background-color: #272822" tabindex="0"><code><span class="line"><span style="color: #88846F"># Load required packages</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(dplyr)</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(purrr)</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Create a sample data frame</span></span>
<span class="line"><span style="color: #F8F8F2">data </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">data.frame</span><span style="color: #F8F8F2">(</span></span>
<span class="line"><span style="color: #F8F8F2">  </span><span style="color: #FD971F; font-style: italic">var1</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">rnorm</span><span style="color: #F8F8F2">(</span><span style="color: #AE81FF">100</span><span style="color: #F8F8F2">, </span><span style="color: #FD971F; font-style: italic">mean</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">10</span><span style="color: #F8F8F2">, </span><span style="color: #FD971F; font-style: italic">sd</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">2</span><span style="color: #F8F8F2">),</span></span>
<span class="line"><span style="color: #F8F8F2">  </span><span style="color: #FD971F; font-style: italic">var2</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">rnorm</span><span style="color: #F8F8F2">(</span><span style="color: #AE81FF">100</span><span style="color: #F8F8F2">, </span><span style="color: #FD971F; font-style: italic">mean</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">20</span><span style="color: #F8F8F2">, </span><span style="color: #FD971F; font-style: italic">sd</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">5</span><span style="color: #F8F8F2">),</span></span>
<span class="line"><span style="color: #F8F8F2">  </span><span style="color: #FD971F; font-style: italic">var3</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">rnorm</span><span style="color: #F8F8F2">(</span><span style="color: #AE81FF">100</span><span style="color: #F8F8F2">, </span><span style="color: #FD971F; font-style: italic">mean</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">30</span><span style="color: #F8F8F2">, </span><span style="color: #FD971F; font-style: italic">sd</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">3</span><span style="color: #F8F8F2">),</span></span>
<span class="line"><span style="color: #F8F8F2">  </span><span style="color: #FD971F; font-style: italic">stringsAsFactors</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">FALSE</span></span>
<span class="line"><span style="color: #F8F8F2">)</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Define a list of summary functions</span></span>
<span class="line"><span style="color: #F8F8F2">summary_functions </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF; font-style: italic">list</span><span style="color: #F8F8F2">(</span><span style="color: #FD971F; font-style: italic">mean</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> mean, </span><span style="color: #FD971F; font-style: italic">median</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> median, </span><span style="color: #FD971F; font-style: italic">sd</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> sd)</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Calculate summary statistics for each variable using nested map functions</span></span>
<span class="line"><span style="color: #F8F8F2">summary_stats </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> map_dfr(summary_functions, </span><span style="color: #F92672">~</span><span style="color: #F8F8F2"> map_dfc(data, .x), </span><span style="color: #FD971F; font-style: italic">.id</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;Statistic&quot;</span><span style="color: #F8F8F2">)</span></span>
<span class="line"><span style="color: #66D9EF">print</span><span style="color: #F8F8F2">(summary_stats)</span></span></code></pre></div>



<p id="ae64"><strong>B. Example 2: Fitting multiple linear models for different subsets of data</strong></p>



<p id="c80a">In this example, we will fit linear models for different subsets of the <code>mtcars</code> dataset based on the number of cylinders. We will use <code>purrr</code> functions to apply the linear model function to each subset and extract the model coefficients.</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" data-code-block-pro-font-family="Code-Pro-JetBrains-Mono" style="font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:15.402786254882812px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="# Load required packages
library(dplyr)
library(purrr)
library(broom)

# Split the mtcars dataset by the number of cylinders
mtcars_split <- mtcars %>% group_split(cyl)

# Define a function to fit a linear model and extract coefficients
fit_lm <- function(data) {
  model <- lm(mpg ~ wt, data = data)
  coef <- data.frame(tidy(model)) %>%
    select(term, estimate) %>%
    mutate(cyl = unique(data$cyl))
  return(coef)
}

# Apply the fit_lm function to each subset using map_dfr()
model_coefs <- map_dfr(mtcars_split, fit_lm)
print(model_coefs)" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki monokai" style="background-color: #272822" tabindex="0"><code><span class="line"><span style="color: #88846F"># Load required packages</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(dplyr)</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(purrr)</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(broom)</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Split the mtcars dataset by the number of cylinders</span></span>
<span class="line"><span style="color: #F8F8F2">mtcars_split </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> mtcars </span><span style="color: #F92672">%>%</span><span style="color: #F8F8F2"> group_split(cyl)</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Define a function to fit a linear model and extract coefficients</span></span>
<span class="line"><span style="color: #A6E22E">fit_lm</span><span style="color: #F8F8F2"> </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">function</span><span style="color: #F8F8F2">(data) {</span></span>
<span class="line"><span style="color: #F8F8F2">  model </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">lm</span><span style="color: #F8F8F2">(mpg </span><span style="color: #F92672">~</span><span style="color: #F8F8F2"> wt, </span><span style="color: #FD971F; font-style: italic">data</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> data)</span></span>
<span class="line"><span style="color: #F8F8F2">  coef </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">data.frame</span><span style="color: #F8F8F2">(tidy(model)) </span><span style="color: #F92672">%>%</span></span>
<span class="line"><span style="color: #F8F8F2">    select(term, estimate) </span><span style="color: #F92672">%>%</span></span>
<span class="line"><span style="color: #F8F8F2">    mutate(</span><span style="color: #FD971F; font-style: italic">cyl</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">unique</span><span style="color: #F8F8F2">(data</span><span style="color: #F92672">$</span><span style="color: #F8F8F2">cyl))</span></span>
<span class="line"><span style="color: #F8F8F2">  </span><span style="color: #F92672">return</span><span style="color: #F8F8F2">(coef)</span></span>
<span class="line"><span style="color: #F8F8F2">}</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Apply the fit_lm function to each subset using map_dfr()</span></span>
<span class="line"><span style="color: #F8F8F2">model_coefs </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> map_dfr(mtcars_split, fit_lm)</span></span>
<span class="line"><span style="color: #66D9EF">print</span><span style="color: #F8F8F2">(model_coefs)</span></span></code></pre></div>



<p id="8a7a"><strong>C. Reading Multiple CSV files with purrr</strong></p>



<p id="1f79">Suppose you have multiple CSV files in a directory and you want to read them all into a single data frame using <code>purrr</code>. Here’s an example of how you can achieve this:</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" data-code-block-pro-font-family="Code-Pro-JetBrains-Mono" style="font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:15.402801513671875px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="# Define the directory containing the CSV files
csv_directory <- &quot;path/to/your/csv/files&quot;

# List all CSV files in the directory
csv_files <- list.files(csv_directory, pattern = &quot;*.csv&quot;, full.names = TRUE)

# Define a function to read a CSV file and add a column with the filename
read_csv_with_filename <- function(file) {
  data <- read_csv(file)
  data <- data %>% mutate(filename = basename(file))
  return(data)
}

# Read all CSV files using map_dfr() and bind the results into a single data frame
combined_data <- map_dfr(csv_files, read_csv_with_filename)" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki monokai" style="background-color: #272822" tabindex="0"><code><span class="line"><span style="color: #88846F"># Define the directory containing the CSV files</span></span>
<span class="line"><span style="color: #F8F8F2">csv_directory </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;path/to/your/csv/files&quot;</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># List all CSV files in the directory</span></span>
<span class="line"><span style="color: #F8F8F2">csv_files </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">list.files</span><span style="color: #F8F8F2">(csv_directory, </span><span style="color: #FD971F; font-style: italic">pattern</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;*.csv&quot;</span><span style="color: #F8F8F2">, </span><span style="color: #FD971F; font-style: italic">full.names</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">TRUE</span><span style="color: #F8F8F2">)</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Define a function to read a CSV file and add a column with the filename</span></span>
<span class="line"><span style="color: #A6E22E">read_csv_with_filename</span><span style="color: #F8F8F2"> </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">function</span><span style="color: #F8F8F2">(file) {</span></span>
<span class="line"><span style="color: #F8F8F2">  data </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> read_csv(file)</span></span>
<span class="line"><span style="color: #F8F8F2">  data </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> data </span><span style="color: #F92672">%>%</span><span style="color: #F8F8F2"> mutate(</span><span style="color: #FD971F; font-style: italic">filename</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">basename</span><span style="color: #F8F8F2">(file))</span></span>
<span class="line"><span style="color: #F8F8F2">  </span><span style="color: #F92672">return</span><span style="color: #F8F8F2">(data)</span></span>
<span class="line"><span style="color: #F8F8F2">}</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Read all CSV files using map_dfr() and bind the results into a single data frame</span></span>
<span class="line"><span style="color: #F8F8F2">combined_data </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> map_dfr(csv_files, read_csv_with_filename)</span></span></code></pre></div>



<p id="9b39">In this example, we first list all the CSV files in the specified directory. Then, we define a custom function <code>read_csv_with_filename()</code> to read each CSV file and add a column with the filename. Finally, we use <code>purrr</code>‘s <code>map_dfr()</code> function to apply the custom function to each file in the list and bind the results into a single data frame.</p>



<p id="bd14"><strong>D. purrr and ggplot2</strong></p>



<p id="78fa">In this example, we’ll demonstrate how to use <code>purrr</code> to create multiple ggplots for different subsets of data within a single data frame. We’ll use the <code>mtcars</code> dataset and create separate ggplots for each unique number of cylinders.</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" data-code-block-pro-font-family="Code-Pro-JetBrains-Mono" style="font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:15.40277099609375px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="# Load required packages
library(purrr)
library(ggplot2)
library(dplyr)
library(cowplot)

# Create a list of data frames, one for each unique number of cylinders in the mtcars dataset
data_list <- mtcars %>%
  split(.$cyl)

# Define a function to create a ggplot for a given data frame
create_ggplot <- function(data) {
  ggplot(data, aes(x = mpg, y = hp)) +
    geom_point(aes(color = factor(gear)), size = 3) +
    labs(title = paste(&quot;Number of Cylinders:&quot;, unique(data$cyl)),
         x = &quot;Miles per Gallon&quot;,
         y = &quot;Horsepower&quot;) +
    theme_minimal() +
    theme(legend.title = element_blank()) +
    scale_color_discrete(name = &quot;Gears&quot;)
}

# Create a list of ggplots using map()
ggplot_list <- data_list %>% 
  map(create_ggplot)

# Combine the ggplots into a single plot using cowplot's plot_grid()
combined_plot <- plot_grid(plotlist = ggplot_list, ncol = 1, align = &quot;v&quot;, rel_heights = c(1, 1, 1))

# Display the combined plot
print(combined_plot)" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki monokai" style="background-color: #272822" tabindex="0"><code><span class="line"><span style="color: #88846F"># Load required packages</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(purrr)</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(ggplot2)</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(dplyr)</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(cowplot)</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Create a list of data frames, one for each unique number of cylinders in the mtcars dataset</span></span>
<span class="line"><span style="color: #F8F8F2">data_list </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> mtcars </span><span style="color: #F92672">%>%</span></span>
<span class="line"><span style="color: #F8F8F2">  </span><span style="color: #66D9EF">split</span><span style="color: #F8F8F2">(.</span><span style="color: #F92672">$</span><span style="color: #F8F8F2">cyl)</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Define a function to create a ggplot for a given data frame</span></span>
<span class="line"><span style="color: #A6E22E">create_ggplot</span><span style="color: #F8F8F2"> </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">function</span><span style="color: #F8F8F2">(data) {</span></span>
<span class="line"><span style="color: #F8F8F2">  ggplot(data, aes(</span><span style="color: #FD971F; font-style: italic">x</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> mpg, </span><span style="color: #FD971F; font-style: italic">y</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> hp)) </span><span style="color: #F92672">+</span></span>
<span class="line"><span style="color: #F8F8F2">    geom_point(aes(</span><span style="color: #FD971F; font-style: italic">color</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">factor</span><span style="color: #F8F8F2">(gear)), </span><span style="color: #FD971F; font-style: italic">size</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">3</span><span style="color: #F8F8F2">) </span><span style="color: #F92672">+</span></span>
<span class="line"><span style="color: #F8F8F2">    labs(</span><span style="color: #FD971F; font-style: italic">title</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">paste</span><span style="color: #F8F8F2">(</span><span style="color: #E6DB74">&quot;Number of Cylinders:&quot;</span><span style="color: #F8F8F2">, </span><span style="color: #66D9EF">unique</span><span style="color: #F8F8F2">(data</span><span style="color: #F92672">$</span><span style="color: #F8F8F2">cyl)),</span></span>
<span class="line"><span style="color: #F8F8F2">         </span><span style="color: #FD971F; font-style: italic">x</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;Miles per Gallon&quot;</span><span style="color: #F8F8F2">,</span></span>
<span class="line"><span style="color: #F8F8F2">         </span><span style="color: #FD971F; font-style: italic">y</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;Horsepower&quot;</span><span style="color: #F8F8F2">) </span><span style="color: #F92672">+</span></span>
<span class="line"><span style="color: #F8F8F2">    theme_minimal() </span><span style="color: #F92672">+</span></span>
<span class="line"><span style="color: #F8F8F2">    theme(</span><span style="color: #FD971F; font-style: italic">legend.title</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> element_blank()) </span><span style="color: #F92672">+</span></span>
<span class="line"><span style="color: #F8F8F2">    scale_color_discrete(</span><span style="color: #FD971F; font-style: italic">name</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;Gears&quot;</span><span style="color: #F8F8F2">)</span></span>
<span class="line"><span style="color: #F8F8F2">}</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Create a list of ggplots using map()</span></span>
<span class="line"><span style="color: #F8F8F2">ggplot_list </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> data_list </span><span style="color: #F92672">%>%</span><span style="color: #F8F8F2"> </span></span>
<span class="line"><span style="color: #F8F8F2">  map(create_ggplot)</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Combine the ggplots into a single plot using cowplot&#39;s plot_grid()</span></span>
<span class="line"><span style="color: #F8F8F2">combined_plot </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> plot_grid(</span><span style="color: #FD971F; font-style: italic">plotlist</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> ggplot_list, </span><span style="color: #FD971F; font-style: italic">ncol</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">1</span><span style="color: #F8F8F2">, </span><span style="color: #FD971F; font-style: italic">align</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;v&quot;</span><span style="color: #F8F8F2">, </span><span style="color: #FD971F; font-style: italic">rel_heights</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">c</span><span style="color: #F8F8F2">(</span><span style="color: #AE81FF">1</span><span style="color: #F8F8F2">, </span><span style="color: #AE81FF">1</span><span style="color: #F8F8F2">, </span><span style="color: #AE81FF">1</span><span style="color: #F8F8F2">))</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Display the combined plot</span></span>
<span class="line"><span style="color: #66D9EF">print</span><span style="color: #F8F8F2">(combined_plot)</span></span></code></pre></div>



<p id="674e">In this example, we first create a list of data frames, one for each unique number of cylinders in the <code>mtcars</code> dataset. Then, we define a custom function <code>create_ggplot()</code> to create a ggplot for a given data frame. The function creates a scatterplot of miles per gallon (mpg) versus horsepower (hp), with a title that reflects the number of cylinders.</p>



<p id="1d4e">Finally, we use <code>purrr</code>‘s <code>map()</code> function to apply the custom function to each data frame in the list, resulting in a list of ggplots. We use a for loop to display each ggplot.</p>



<p id="4483">The plot we get can be seen below:</p>


<div class="wp-block-image">
<figure class="aligncenter size-full"><img loading="lazy" decoding="async" width="720" height="663" loading="lazy" src="https://analyticadss.com/wp-content/uploads/2023/04/1_0zvCrChg56Sq3kBk3neGSg.webp" alt="" class="wp-image-6139" srcset="https://analyticadss.com/wp-content/uploads/2023/04/1_0zvCrChg56Sq3kBk3neGSg.webp 720w, https://analyticadss.com/wp-content/uploads/2023/04/1_0zvCrChg56Sq3kBk3neGSg-500x460.webp 500w, https://analyticadss.com/wp-content/uploads/2023/04/1_0zvCrChg56Sq3kBk3neGSg-150x138.webp 150w" sizes="auto, (max-width: 720px) 100vw, 720px" /></figure>
</div>


<p id="aa70">In this example, we’ve made some changes to the <code>create_ggplot()</code> function to improve the aesthetics of the plots:</p>



<ol class="wp-block-list">
<li>We use <code>geom_point(aes(color = factor(gear)), size = 3)</code> to color the points by the number of gears and increase their size.</li>



<li>We apply <code>theme_minimal()</code> to use a minimalistic theme for the plots.</li>



<li>We remove the legend title using <code>theme(legend.title = element_blank())</code>.</li>



<li>We rename the color scale to “Gears” using <code>scale_color_discrete(name = "Gears")</code>.</li>
</ol>



<p id="f475">Finally, we use the <code>plot_grid()</code> function from the <code>cowplot</code> package to combine the ggplots in the <code>ggplot_list</code> into a single plot with one column and display the combined plot.</p>



<p id="2900">These examples showcase how the <code>purrr</code> package can help you write more efficient and readable code, making your data analysis workflows more robust and maintainable. By incorporating <code>purrr</code> into your R projects, you can take full advantage of functional programming techniques and harness their power to solve complex problems.</p>



<hr class="wp-block-separator has-alpha-channel-opacity is-style-dots"/>



<h2 class="wp-block-heading has-medium-font-size" id="5467">Tips and Best Practices for Using purrr</h2>



<p id="e68e">In this final section, we will share some tips and best practices for using the <code>purrr</code> package in your R projects. These recommendations will help you write more efficient, readable, and maintainable code.</p>



<p id="6bbd"><strong>1. Use anonymous functions when appropriate</strong></p>



<p id="5346">When using <code>map()</code> functions, you can create anonymous functions using the <code>~</code> notation, which allows for concise and readable code. However, if the function becomes too complex or is used multiple times, consider defining it as a separate named function for better code organization and readability.</p>



<p id="aafb"><strong>2. Leverage the power of function composition</strong></p>



<p id="d747">The <code>compose()</code> function allows you to create new functions by combining existing ones. This technique promotes code reusability and makes it easier to build complex functionality by breaking it down into simpler, more manageable parts.</p>



<p id="76d8"><strong>3. Handle errors gracefully</strong></p>



<p id="21ec">When applying a function to a list or vector, use functions like <code>safely()</code>, <code>quietly()</code>, and <code>possibly()</code> to handle errors gracefully without stopping the execution of your code. This approach ensures that your code remains robust and can handle unexpected input values.</p>



<p id="97a5"><strong>4. Know when to use purrr vs. base R or dplyr</strong></p>



<p id="aac3">While <code>purrr</code> provides a powerful and flexible way to manipulate data, there are cases where base R or <code>dplyr</code> functions may be more appropriate or efficient. For example, if you need to perform simple operations on a data frame, consider using <code>dplyr</code> functions like <code>mutate()</code> or <code>summarize()</code>. Evaluate the needs of your specific task and choose the best tool for the job.</p>



<p id="d5c9"><strong>5. Familiarize yourself with the purrr documentation</strong></p>



<p id="57fa">The <code>purrr</code> package has a wealth of functions and features that can help you streamline your code and solve complex problems. Make sure to consult the official documentation (<a href="https://purrr.tidyverse.org/" rel="noreferrer noopener" target="_blank">https://purrr.tidyverse.org/</a>) to explore its full capabilities and discover new techniques.</p>



<p id="eac3">By following these tips and best practices, you can fully leverage the power of the <code>purrr</code> package in your R projects, making your code more efficient, readable, and maintainable. Embrace the functional programming paradigm and use <code>purrr</code> to solve real-world data analysis challenges with ease.</p>



<hr class="wp-block-separator has-alpha-channel-opacity is-style-dots"/>



<h1 class="wp-block-heading" id="c363">Wrapping up</h1>



<p id="6914">Throughout this article, we’ve delved into the capabilities and adaptability of R’s <code>purrr</code> package in the realm of functional programming and data handling. Spanning from foundational functional programming principles to the pivotal role of the map() function suite, all the way to intricate subjects like engaging nested data sets and adept error management.</p>



<p id="75dd">Using real-world scenarios, we’ve showcased how <code>purrr</code> can be instrumental in de-complicating daunting tasks, optimizing your scripts, and enhancing its legibility and sustainability. Incorporating <code>purrr</code> into your R utilities ensures a smoother journey through data manipulation and analytical hurdles.</p>



<p id="78e2">As you venture further into the depths of the <code>purrr</code> package, bear in mind that mastery comes with repetition. Embrace exploration, and endeavor to ingeniously apply <code>purrr</code> functionalities in your endeavors. With perseverance, you’ll cultivate a profound grasp of its intricacies, propelling you towards proficient data management in R.</p>



<p id="15ba">Happy coding!</p>



<p id="b396"><strong>Further Reading and Exploration:</strong></p>



<p id="79d8">For those eager to expand their expertise on <code>purrr</code> and R’s functional programming, consider the following treasure trove of resources:</p>



<ol class="wp-block-list">
<li><code>purrr’s</code> Official Guide: As a logical first step, the <code>purrr</code> package’s official documentation provides a thorough overview of all it offers. Dive into the nuances at <code>purrr’s</code><a href="https://purrr.tidyverse.org/" rel="noreferrer noopener" target="_blank"> official site</a>.</li>



<li>R for Data Science: A masterpiece penned by Hadley Wickham and Garrett Grolemund, this digital tome offers an exhaustive look into R’s role in data science. Notably, it features a segment dedicated to <code>purrr’s</code> prowess in functional programming. Grab your copy <a href="https://r4ds.had.co.nz/" rel="noreferrer noopener" target="_blank">here</a>.</li>



<li>Advanced R: A deeper dive by Hadley Wickham, “Advanced R” ventures into the more intricate aspects of R, shedding light on advanced functional programming paradigms. Embark on this advanced journey <a href="https://adv-r.hadley.nz/" rel="noreferrer noopener" target="_blank">here</a>.</li>



<li>RStudio’s Vibrant Community: Seeking advice, hoping to discuss new findings, or simply aiming to network? The RStudio community is a hub of enthusiasts, experts, and curious minds. Engage with like-minded individuals <a href="https://community.rstudio.com/" rel="noreferrer noopener" target="_blank">right here</a>.</li>
</ol>



<p id="238c">Harnessing these resources and proactively mingling with the wider R circle will undoubtedly refine your prowess with both the <code>purrr</code> package and R’s functional programming realm. Continue your journey of discovery, trial, and collaborative learning to blossom as an adept data scientist and R aficionado.</p>
<p>The post <a href="https://analyticadss.com/unleash-the-power-of-functional-programming-in-r-with-the-purrr-package/">Unleash the Power of Functional Programming in R with the purrr Package</a> appeared first on <a href="https://analyticadss.com">Analytica Data Science Solutions</a>.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>The One Mistake You’re Making in Starting a Data Science Project (And How to Avoid It)</title>
		<link>https://analyticadss.com/the-one-mistake-youre-making-in-starting-a-data-science-project-and-how-to-avoid-it/</link>
		
		<dc:creator><![CDATA[Aous Abdo]]></dc:creator>
		<pubDate>Fri, 30 Dec 2022 19:17:35 +0000</pubDate>
				<category><![CDATA[Data Science]]></category>
		<category><![CDATA[Data Science Careers]]></category>
		<guid isPermaLink="false">https://analyticadss.com/?p=5813</guid>

					<description><![CDATA[<p>“The One Mistake You’re Making in Starting a Data Science Project (And How to Avoid It)” Building an Analytics portfolio project can be a daunting task, especially for those who are new to data analytics. In a survey we conducted at Analytica Data Science Solutions, we found that most people struggled with knowing where to [&#8230;]</p>
<p>The post <a href="https://analyticadss.com/the-one-mistake-youre-making-in-starting-a-data-science-project-and-how-to-avoid-it/">The One Mistake You’re Making in Starting a Data Science Project (And How to Avoid It)</a> appeared first on <a href="https://analyticadss.com">Analytica Data Science Solutions</a>.</p>
]]></description>
										<content:encoded><![CDATA[
<p>“The One Mistake You’re Making in Starting a Data Science Project (And How to Avoid It)”</p>



<p id="d3b8">Building an Analytics portfolio project can be a daunting task, especially for those who are new to data analytics. In a survey we conducted at Analytica Data Science Solutions, we found that most people struggled with knowing where to begin. Often, people will find a data set that they find interesting and try to build a project around it from scratch. This can lead to wasted time and to the possibility of abandoning the project..</p>



<p id="2d29">So, where do you start when it comes to a data analytics project? As a data scientist with experience in numerous projects, I’ve learned that the key is to start with the problem, not the solution. Just like a business venture, you don’t start with a solution and try to sell it. Instead, you start with the core problem and develop a solution for it. The days of “Build it and they will come” are long gone!</p>



<p id="3945">For example, the founders of YouTube identified a problem in the early 2000s: there was no platform to share video clips. So they built YouTube to solve this problem. Similarly, during the onset of the COVID-19 pandemic, there was a need to understand how the virus was spreading geographically. A PhD student at Johns Hopkins University developed a famous dashboard to solve this problem, which is now being used by governments and agencies around the world.</p>



<h2 class="wp-block-heading">How can I find my ideas from ?</h2>



<p id="f1da">To find inspiration for data analytics projects, it’s important to look at what problems others are trying to solve. This can be through reading blog posts or research papers, or checking out platforms like Tableau Public or GitHub. Without investing time in understanding the problems that can be solved with data analytics, you may never reach your full potential in actually solving those problems.</p>



<p id="4ff8">Once you have identified a problem to solve, the next step is to define your project’s scope and objectives. This will help you stay focused and on track as you move forward with your project. From there, you can start to gather and clean your data, and then move on to the analysis and visualization stages..</p>



<p id="c90d">Remember, starting a data analytics project is not about the data set, it’s about the problem you are trying to solve. By beginning with the problem and following a structured approach, you can successfully complete your portfolio project and showcase your skills to potential employers or clients.</p>



<p>Read More blogs in AnalyticaDSS Blogs here : <a href="https://analyticadss.com/blog">BLOGS</a></p>



<p>Read More blogs in Medium : <a href="https://medium.com/@aousabdo">Medium Blogs</a></p>
<p>The post <a href="https://analyticadss.com/the-one-mistake-youre-making-in-starting-a-data-science-project-and-how-to-avoid-it/">The One Mistake You’re Making in Starting a Data Science Project (And How to Avoid It)</a> appeared first on <a href="https://analyticadss.com">Analytica Data Science Solutions</a>.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>The Tidyverse and data.table R Packages</title>
		<link>https://analyticadss.com/the-tidyverse-and-data-table-r-packages/</link>
		
		<dc:creator><![CDATA[Aous Abdo]]></dc:creator>
		<pubDate>Sun, 14 Feb 2021 15:21:31 +0000</pubDate>
				<category><![CDATA[Data Science]]></category>
		<category><![CDATA[R Statistical Language]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[Tidyverse]]></category>
		<guid isPermaLink="false">https://analyticadss.com/?p=4821</guid>

					<description><![CDATA[<p>“The Tidyverse and data.table R Packages” The power of R comes from the vast collection of software libraries, i.e. packages, that can be easily installed and loaded in R. Today we will cover two of the most powerful packages in R, the tidyverse and data.table packages. The tidyverse and data.table are two popular packages in R that provide functions for working with data. [&#8230;]</p>
<p>The post <a href="https://analyticadss.com/the-tidyverse-and-data-table-r-packages/">The Tidyverse and data.table R Packages</a> appeared first on <a href="https://analyticadss.com">Analytica Data Science Solutions</a>.</p>
]]></description>
										<content:encoded><![CDATA[
<p>“The Tidyverse and data.table R Packages”</p>



<p id="73a3">The power of R comes from the vast collection of software libraries, i.e. packages, that can be easily installed and loaded in R. Today we will cover two of the most powerful packages in R, the <strong><code>tidyverse</code> </strong>and <code><strong>data.table</strong></code> packages.</p>



<p id="12a6">The <strong><code>tidyverse</code> </strong>and <strong><code>data.table</code> </strong>are two popular packages in R that provide functions for working with data. They both have their own strengths and are suitable for different types of tasks.</p>



<p id="df56">The <strong><code>tidyverse</code> </strong>is a collection of packages designed for data manipulation, visualization, and modeling. It is based on the principles of tidy data, which suggests that data should be structured in a way that makes it easy to work with. The <strong><code>tidyverse</code> </strong>includes packages such as <code><strong>dplyr</strong></code>, <code><strong>tidyr</strong></code>, and <code>ggplot2</code>, which provides functions for data manipulation, cleaning, and visualization.</p>



<p id="1c2b">One of the main advantages of the <strong><code>tidyverse</code> </strong>is its simplicity. The functions in the <strong><code>tidyverse</code> </strong>are easy to learn and use, and they often require fewer lines of code compared to other packages. They also have a consistent syntax, which makes it easier to learn and use multiple functions.</p>



<hr class="wp-block-separator has-alpha-channel-opacity"/>



<h2 class="wp-block-heading" id="c2b4">Examples: Tidyverse Examples</h2>



<p id="2da7">Here are some examples of how to use the <code><strong>tidyverse</strong></code>:</p>



<p id="2df8">To select specific columns from a dataset:</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" style="font-size:.875rem;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:7.704864501953125px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="# Load the tidyverse package
library(tidyverse)

# Load the mpg dataset from the ggplot2 package
data(mpg)

# Select the &quot;manufacturer&quot; and &quot;model&quot; columns
mpg %>% select(manufacturer, model)" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki" style="background-color: #272822"><code><span class="line"><span style="color: #88846F"># Load the tidyverse package</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(tidyverse)</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Load the mpg dataset from the ggplot2 package</span></span>
<span class="line"><span style="color: #66D9EF">data</span><span style="color: #F8F8F2">(mpg)</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Select the &quot;manufacturer&quot; and &quot;model&quot; columns</span></span>
<span class="line"><span style="color: #F8F8F2">mpg </span><span style="color: #F92672">%>%</span><span style="color: #F8F8F2"> select(manufacturer, model)</span></span></code></pre></div>



<p id="8924">And to group and summarize a dataset:</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" style="font-size:.875rem;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:7.704864501953125px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="# Load the tidyverse package
library(tidyverse)

# Load the mpg dataset from the ggplot2 package
data(mpg)

# Group the dataset by &quot;class&quot; and compute the mean of the &quot;hwy&quot; column
mpg %>% group_by(class) %>% summarize(mean_hwy = mean(hwy))" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki" style="background-color: #272822"><code><span class="line"><span style="color: #88846F"># Load the tidyverse package</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(tidyverse)</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Load the mpg dataset from the ggplot2 package</span></span>
<span class="line"><span style="color: #66D9EF">data</span><span style="color: #F8F8F2">(mpg)</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Group the dataset by &quot;class&quot; and compute the mean of the &quot;hwy&quot; column</span></span>
<span class="line"><span style="color: #F8F8F2">mpg </span><span style="color: #F92672">%>%</span><span style="color: #F8F8F2"> group_by(class) </span><span style="color: #F92672">%>%</span><span style="color: #F8F8F2"> summarize(</span><span style="color: #FD971F">mean_hwy</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">mean</span><span style="color: #F8F8F2">(hwy))</span></span></code></pre></div>



<p id="d645">To join two datasets:</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" style="font-size:.875rem;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:7.704864501953125px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="# Load the tidyverse package
library(tidyverse)

# Load the mpg and cylinders datasets from the ggplot2 package
data(mpg)
data(cylinders)

# Join the mpg and cylinders datasets on the &quot;manufacturer&quot; column
mpg %>% left_join(cylinders, by = &quot;manufacturer&quot;)" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki" style="background-color: #272822"><code><span class="line"><span style="color: #88846F"># Load the tidyverse package</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(tidyverse)</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Load the mpg and cylinders datasets from the ggplot2 package</span></span>
<span class="line"><span style="color: #66D9EF">data</span><span style="color: #F8F8F2">(mpg)</span></span>
<span class="line"><span style="color: #66D9EF">data</span><span style="color: #F8F8F2">(cylinders)</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Join the mpg and cylinders datasets on the &quot;manufacturer&quot; column</span></span>
<span class="line"><span style="color: #F8F8F2">mpg </span><span style="color: #F92672">%>%</span><span style="color: #F8F8F2"> left_join(cylinders, </span><span style="color: #FD971F">by</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;manufacturer&quot;</span><span style="color: #F8F8F2">)</span></span></code></pre></div>



<p id="68fd">To perform a linear regression using the <code>lm</code> function from the <code>stats</code> package:</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" style="font-size:.875rem;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:15.395835876464844px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="# Load the tidyverse and stats packages
library(tidyverse)
library(stats)

# Load the mtcars dataset
data(mtcars)

# Perform a linear regression to predict mpg (miles per gallon) using wt (weight) as the predictor variable
fit <- mtcars %>% 
  lm(mpg ~ wt, data = .)

# Summarize the model results
summary(fit)" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki" style="background-color: #272822"><code><span class="line"><span style="color: #88846F"># Load the tidyverse and stats packages</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(tidyverse)</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(stats)</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Load the mtcars dataset</span></span>
<span class="line"><span style="color: #66D9EF">data</span><span style="color: #F8F8F2">(mtcars)</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Perform a linear regression to predict mpg (miles per gallon) using wt (weight) as the predictor variable</span></span>
<span class="line"><span style="color: #F8F8F2">fit </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> mtcars </span><span style="color: #F92672">%>%</span><span style="color: #F8F8F2"> </span></span>
<span class="line"><span style="color: #F8F8F2">  </span><span style="color: #66D9EF">lm</span><span style="color: #F8F8F2">(mpg </span><span style="color: #F92672">~</span><span style="color: #F8F8F2"> wt, </span><span style="color: #FD971F">data</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> .)</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Summarize the model results</span></span>
<span class="line"><span style="color: #66D9EF">summary</span><span style="color: #F8F8F2">(fit)</span></span></code></pre></div>



<p id="9d7e">Create a scatterplot matrix using the <code>scatterplotMatrix</code> function from the <code>car</code> package:</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" style="font-size:.875rem;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:7.704864501953125px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="# Load the tidyverse and car packages
library(tidyverse)
library(car)

# Load the iris dataset
data(iris)

# Create a scatterplot matrix of the iris dataset
scatterplotMatrix(iris, smooth = FALSE)" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki" style="background-color: #272822"><code><span class="line"><span style="color: #88846F"># Load the tidyverse and car packages</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(tidyverse)</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(car)</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Load the iris dataset</span></span>
<span class="line"><span style="color: #66D9EF">data</span><span style="color: #F8F8F2">(iris)</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Create a scatterplot matrix of the iris dataset</span></span>
<span class="line"><span style="color: #F8F8F2">scatterplotMatrix(iris, </span><span style="color: #FD971F">smooth</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">FALSE</span><span style="color: #F8F8F2">)</span></span></code></pre></div>



<p id="9cfb">Create a faceted bar plot using <code><strong>ggplot2</strong></code>:</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" style="font-size:.875rem;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:15.395843505859375px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="# Load the tidyverse package
library(tidyverse)

# Load the mpg dataset from the ggplot2 package
data(mpg)

# Create a faceted bar plot showing the distribution of hwy (highway miles per gallon) by class and drv (drive type)
ggplot(mpg, aes(x = hwy)) +
  geom_histogram(binwidth = 2) +
  facet_wrap(~ class + drv, nrow = 2)" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki" style="background-color: #272822"><code><span class="line"><span style="color: #88846F"># Load the tidyverse package</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(tidyverse)</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Load the mpg dataset from the ggplot2 package</span></span>
<span class="line"><span style="color: #66D9EF">data</span><span style="color: #F8F8F2">(mpg)</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Create a faceted bar plot showing the distribution of hwy (highway miles per gallon) by class and drv (drive type)</span></span>
<span class="line"><span style="color: #F8F8F2">ggplot(mpg, aes(</span><span style="color: #FD971F">x</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> hwy)) </span><span style="color: #F92672">+</span></span>
<span class="line"><span style="color: #F8F8F2">  geom_histogram(</span><span style="color: #FD971F">binwidth</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">2</span><span style="color: #F8F8F2">) </span><span style="color: #F92672">+</span></span>
<span class="line"><span style="color: #F8F8F2">  facet_wrap(</span><span style="color: #F92672">~</span><span style="color: #F8F8F2"> class </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> drv, </span><span style="color: #FD971F">nrow</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">2</span><span style="color: #F8F8F2">)</span></span></code></pre></div>



<hr class="wp-block-separator has-alpha-channel-opacity"/>



<h2 class="wp-block-heading" id="804c">Examples: data.table Examples</h2>



<p id="17bf">The <code><strong>data.table</strong></code> package, on the other hand, is a high-performance package for working with large datasets. It provides functions for manipulating and querying data efficiently. The <code><strong>data.table</strong></code> package is particularly useful when working with datasets that are too large to fit in memory or when you need to perform complex operations on large datasets.</p>



<h4 class="wp-block-heading">One of the main advantages of the <code><strong>data.table</strong></code> package</h4>



<p id="9709">One of the main advantages of the <code><strong>data.table</strong></code> package is its speed. The functions in the <code><strong>data.table</strong></code> package are generally faster than their counterparts in the <code><strong>tidyverse</strong></code>, especially when working with large datasets.</p>



<p id="d980">Here are some more examples of how to use the<strong> <code>data.table</code></strong> package:</p>



<p id="9202">To select specific columns from a dataset:</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" style="font-size:.875rem;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:15.395843505859375px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="# Load the data.table package
library(data.table)

# Load the mpg dataset from the ggplot2 package
data(mpg)

# Convert the dataset to a data.table
mpg <- as.data.table(mpg)

# Select the &quot;manufacturer&quot; and &quot;model&quot; columns
mpg[, .(manufacturer, model)]" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki" style="background-color: #272822"><code><span class="line"><span style="color: #88846F"># Load the data.table package</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(data.table)</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Load the mpg dataset from the ggplot2 package</span></span>
<span class="line"><span style="color: #66D9EF">data</span><span style="color: #F8F8F2">(mpg)</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Convert the dataset to a data.table</span></span>
<span class="line"><span style="color: #F8F8F2">mpg </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> as.data.table(mpg)</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Select the &quot;manufacturer&quot; and &quot;model&quot; columns</span></span>
<span class="line"><span style="color: #F8F8F2">mpg[, .(manufacturer, model)]</span></span></code></pre></div>



<p id="9768">and to group and summarize a dataset:</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" style="font-size:.875rem;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:15.395843505859375px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="# Load the data.table package
library(data.table)

# Load the mpg dataset from the ggplot2 package
data(mpg)

# Convert the dataset to a data.table
mpg <- as.data.table(mpg)

# Group the dataset by &quot;class&quot; and compute the mean of the &quot;hwy&quot; column
mpg[, .(mean_hwy = mean(hwy)), by = class]" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki" style="background-color: #272822"><code><span class="line"><span style="color: #88846F"># Load the data.table package</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(data.table)</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Load the mpg dataset from the ggplot2 package</span></span>
<span class="line"><span style="color: #66D9EF">data</span><span style="color: #F8F8F2">(mpg)</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Convert the dataset to a data.table</span></span>
<span class="line"><span style="color: #F8F8F2">mpg </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> as.data.table(mpg)</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Group the dataset by &quot;class&quot; and compute the mean of the &quot;hwy&quot; column</span></span>
<span class="line"><span style="color: #F8F8F2">mpg[, .(</span><span style="color: #FD971F">mean_hwy</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">mean</span><span style="color: #F8F8F2">(hwy)), </span><span style="color: #FD971F">by</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> class]</span></span></code></pre></div>



<p id="b569">To join two datasets:</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" style="font-size:.875rem;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:15.39581298828125px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="# Load the data.table package
library(data.table)

# Load the mpg and cylinders datasets from the ggplot2 package
data(mpg)
data(cylinders)

# Convert the datasets to data.tables
mpg <- as.data.table(mpg)
cylinders <- as.data.table(cylinders)

# Join the mpg and cylinders datasets on the &quot;manufacturer&quot; column
mpg[cylinders, on = &quot;manufacturer&quot;]" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki" style="background-color: #272822"><code><span class="line"><span style="color: #88846F"># Load the data.table package</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(data.table)</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Load the mpg and cylinders datasets from the ggplot2 package</span></span>
<span class="line"><span style="color: #66D9EF">data</span><span style="color: #F8F8F2">(mpg)</span></span>
<span class="line"><span style="color: #66D9EF">data</span><span style="color: #F8F8F2">(cylinders)</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Convert the datasets to data.tables</span></span>
<span class="line"><span style="color: #F8F8F2">mpg </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> as.data.table(mpg)</span></span>
<span class="line"><span style="color: #F8F8F2">cylinders </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> as.data.table(cylinders)</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Join the mpg and cylinders datasets on the &quot;manufacturer&quot; column</span></span>
<span class="line"><span style="color: #F8F8F2">mpg[cylinders, </span><span style="color: #FD971F">on</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;manufacturer&quot;</span><span style="color: #F8F8F2">]</span></span></code></pre></div>



<p id="4fbd">Perform a linear regression using the <code><strong>lm</strong></code><em> </em>function from the <code>stats</code> package and the <code><strong>data.table</strong></code> package:</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" style="font-size:.875rem;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:15.395835876464844px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="# Load the data.table and stats packages
library(data.table)
library(stats)

# Load the mtcars dataset
data(mtcars)

# Convert the dataset to a data.table
mtcars <- setDT(mtcars)

# Perform a linear regression to predict mpg (miles per gallon) using wt (weight) as the predictor variable
fit <- mtcars[, lm(mpg ~ wt)]

# Summarize the model results
summary(fit)" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki" style="background-color: #272822"><code><span class="line"><span style="color: #88846F"># Load the data.table and stats packages</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(data.table)</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(stats)</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Load the mtcars dataset</span></span>
<span class="line"><span style="color: #66D9EF">data</span><span style="color: #F8F8F2">(mtcars)</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Convert the dataset to a data.table</span></span>
<span class="line"><span style="color: #F8F8F2">mtcars </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> setDT(mtcars)</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Perform a linear regression to predict mpg (miles per gallon) using wt (weight) as the predictor variable</span></span>
<span class="line"><span style="color: #F8F8F2">fit </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> mtcars[, </span><span style="color: #66D9EF">lm</span><span style="color: #F8F8F2">(mpg </span><span style="color: #F92672">~</span><span style="color: #F8F8F2"> wt)]</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Summarize the model results</span></span>
<span class="line"><span style="color: #66D9EF">summary</span><span style="color: #F8F8F2">(fit)</span></span></code></pre></div>



<p id="1bd0">Create a scatterplot matrix using the <code><strong>scatterplotMatrix</strong></code> function from the <strong><code>car</code> </strong>package and the <code><strong>data.table</strong></code> package:</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" style="font-size:.875rem;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:15.395843505859375px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="# Load the data.table and car packages
library(data.table)
library(car)

# Load the iris dataset
data(iris)

# Convert the dataset to a data.table
iris <- as.data.table(iris)

# Create a scatterplot matrix of the iris dataset
scatterplotMatrix(iris, smooth = FALSE)" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki" style="background-color: #272822"><code><span class="line"><span style="color: #88846F"># Load the data.table and car packages</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(data.table)</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(car)</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Load the iris dataset</span></span>
<span class="line"><span style="color: #66D9EF">data</span><span style="color: #F8F8F2">(iris)</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Convert the dataset to a data.table</span></span>
<span class="line"><span style="color: #F8F8F2">iris </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> as.data.table(iris)</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Create a scatterplot matrix of the iris dataset</span></span>
<span class="line"><span style="color: #F8F8F2">scatterplotMatrix(iris, </span><span style="color: #FD971F">smooth</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">FALSE</span><span style="color: #F8F8F2">)</span></span></code></pre></div>



<p id="007f">Create a faceted bar plot using <strong><code>ggplot2</code> </strong>and the <code><strong>data.table</strong></code> package:</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" style="font-size:.875rem;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:15.395843505859375px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="# Load the data.table and ggplot2 packages
library(data.table)
library(ggplot2)

# Load the mpg dataset from the ggplot2 package
data(mpg)

# Convert the dataset to a data.table
mpg <- as.data.table(mpg)

# Create a faceted bar plot showing the distribution of hwy (highway miles per gallon) by class and drv (drive type)
ggplot(mpg, aes(x = hwy)) +
  geom_histogram(binwidth = 2) +
  facet_wrap(~ class + drv, nrow = 2)" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki" style="background-color: #272822"><code><span class="line"><span style="color: #88846F"># Load the data.table and ggplot2 packages</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(data.table)</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(ggplot2)</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Load the mpg dataset from the ggplot2 package</span></span>
<span class="line"><span style="color: #66D9EF">data</span><span style="color: #F8F8F2">(mpg)</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Convert the dataset to a data.table</span></span>
<span class="line"><span style="color: #F8F8F2">mpg </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> as.data.table(mpg)</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Create a faceted bar plot showing the distribution of hwy (highway miles per gallon) by class and drv (drive type)</span></span>
<span class="line"><span style="color: #F8F8F2">ggplot(mpg, aes(</span><span style="color: #FD971F">x</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> hwy)) </span><span style="color: #F92672">+</span></span>
<span class="line"><span style="color: #F8F8F2">  geom_histogram(</span><span style="color: #FD971F">binwidth</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">2</span><span style="color: #F8F8F2">) </span><span style="color: #F92672">+</span></span>
<span class="line"><span style="color: #F8F8F2">  facet_wrap(</span><span style="color: #F92672">~</span><span style="color: #F8F8F2"> class </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> drv, </span><span style="color: #FD971F">nrow</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">2</span><span style="color: #F8F8F2">)</span></span></code></pre></div>



<hr class="wp-block-separator has-alpha-channel-opacity"/>



<p id="f013">In terms of implementation, both the <strong><code>tidyverse</code> </strong>and <code><strong>data.table</strong></code> packages are written in R, but some of the functions in the <code><strong>data.table</strong></code> package are implemented in C for improved performance.</p>



<h2 class="wp-block-heading">In summary</h2>



<p id="4b51">the <code><strong>tidyverse</strong></code> and <code><strong>data.table</strong> </code>are two popular packages in R that provide functions for working with data. The <strong><code>tidyverse</code> </strong>is a collection of packages designed for data manipulation, visualization, and modeling, and it is particularly suitable for tasks that require simplicity and ease of use. The <strong><code>tidyverse</code> </strong>functions are easy to learn and use, and they often require fewer lines of code compared to other packages.</p>



<p id="9f5e">The <code><strong>data.table</strong></code> package is a high-performance package for working with large datasets, and it is particularly useful when working with large datasets or when you need to perform complex operations on large datasets. The functions in the <code><strong>data.table</strong></code> package are generally faster than their counterparts in the, especially when working with large datasets.</p>



<p id="612d">In general, it is a good idea to use the <strong><code>tidyverse</code> </strong>for most tasks, unless you are working with very large datasets or need the extra performance provided by the <code><strong>data.table</strong></code> package.</p>



<h4 class="wp-block-heading">At Analytica</h4>



<p id="4600">and since we deal with larger datasets, GB to TB of data, our preferred tool for data wrangling in R is in fact <code><strong>data.table</strong></code>.</p>



<p id="9cc4">I hope this article helps the reader understand the differences between the <strong><code>tidyverse</code> </strong>and <code><strong>data.table</strong></code> in R, and how to choose the right package for their tasks. Let me know if you have any questions.</p>



<p>Read More blogs in AnalyticaDSS Blogs here : <a href="https://analyticadss.com/blog">BLOGS</a></p>



<p>Read More blogs in Medium : <a href="https://medium.com/@aousabdo">Medium Blogs</a></p>



<p>Read More blogs in R-bloggers : <a href="https://www.r-bloggers.com/">https://www.r-bloggers.com</a></p>
<p>The post <a href="https://analyticadss.com/the-tidyverse-and-data-table-r-packages/">The Tidyverse and data.table R Packages</a> appeared first on <a href="https://analyticadss.com">Analytica Data Science Solutions</a>.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Analyzing Crypto Market using R — Part 2</title>
		<link>https://analyticadss.com/analyzing-cryptocurrency-markets-using-r-part-2/</link>
		
		<dc:creator><![CDATA[Aous Abdo]]></dc:creator>
		<pubDate>Mon, 24 Dec 2018 10:34:58 +0000</pubDate>
				<category><![CDATA[Data Science]]></category>
		<category><![CDATA[R Statistical Language]]></category>
		<category><![CDATA[Bitcoin]]></category>
		<category><![CDATA[Cryptocurrency]]></category>
		<category><![CDATA[Data Analysis]]></category>
		<category><![CDATA[R]]></category>
		<guid isPermaLink="false">https://analyticadss.com/?p=4907</guid>

					<description><![CDATA[<p>Correlations in the Crypto World Analyzing crypto market Aous Abdo, WWW.ANALYTICADSS.COMAn interactive version of this post can be found on here. In my previous post I explored bitcoin data from different exchanges, we also covered some arbitrage-related data. In part 2 of this series I will explore alt coin related data. R Libraries Below is a list [&#8230;]</p>
<p>The post <a href="https://analyticadss.com/analyzing-cryptocurrency-markets-using-r-part-2/">Analyzing Crypto Market using R — Part 2</a> appeared first on <a href="https://analyticadss.com">Analytica Data Science Solutions</a>.</p>
]]></description>
										<content:encoded><![CDATA[
<h2 class="wp-block-heading" id="bf86">Correlations in the Crypto World</h2>



<p>Analyzing crypto market</p>



<p><a href="https://medium.com/u/4f20dbfad286?source=post_page-----b1a0aa44006e--------------------------------" rel="noreferrer noopener" target="_blank">Aous Abdo</a>, <a href="http://www.analyticadss.com/" rel="noreferrer noopener" target="_blank">WWW.ANALYTICADSS.COM</a><br>An interactive version of this post can be found on <a href="https://analyticadss.com/adss_blog/crypto_notebook_part2.nb.html" rel="noreferrer noopener" target="_blank">here</a>.</p>



<p id="ae2d">In my previous post I explored bitcoin data from different exchanges, we also covered some arbitrage-related data. In part 2 of this series I will explore alt coin related data.</p>



<h2 class="wp-block-heading" id="3a8a">R Libraries</h2>



<p id="46c5">Below is a list of R libraries we will be using to help us with our analysis. Not all of them are necessary but they all will make our life easier.</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" style="font-size:.875rem;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:15.395835876464844px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="library(PoloniexR)
library(data.table)
library(lubridate)
library(Quandl)
library(plyr)
library(stringr)
library(ggplot2)
library(plotly)
library(janitor)
library(quantmod)
library(pryr)
library(corrplot)
library(PerformanceAnalytics)
library(tidyr)
library(MLmetrics)
library(tidyquant)
library(corrr)
library(cowplot)" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki" style="background-color: #272822"><code><span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(PoloniexR)</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(data.table)</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(lubridate)</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(Quandl)</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(plyr)</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(stringr)</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(ggplot2)</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(plotly)</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(janitor)</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(quantmod)</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(pryr)</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(corrplot)</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(PerformanceAnalytics)</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(tidyr)</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(MLmetrics)</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(tidyquant)</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(corrr)</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(cowplot)</span></span></code></pre></div>



<h2 class="wp-block-heading" id="7a90">Data</h2>



<p id="f5c0">The best source I know off to get alt-coin data is through <a href="https://cran.r-project.org/web/packages/PoloniexR/index.html" rel="noreferrer noopener" target="_blank">PoloniexR</a>. I have written an R function to help download data.</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" style="font-size:.875rem;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:23.104170322418213px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="get_alt_data <- function(tz = &quot;UTC&quot;
                         , coin = c(&quot;ETH&quot;, &quot;LTC&quot;)
                         , add_bitcoin = TRUE
                         , return_in_USDT = TRUE
                         , from = &quot;2017-01-01&quot;
                         , to = &quot;2018-04-09&quot;
                         , period = &quot;D&quot;
                         , verbose = FALSE){
  
  # We will be using the public API
  poloniex.public <- PoloniexPublicAPI()
  
  # set the time zone to utc
  Sys.setenv(tz = tz)
  
  # convert from and to into time obj
  from  <- as.POSIXct(paste(from, tz, sep = &quot;&quot;))
  to    <- as.POSIXct(paste(to, tz, sep = &quot;&quot;))
  
  # lists to store data.tables and xts objects
  chart_list <- list()
  dt_list    <- list()
  
  # make sure the coin pair is in upper case
  coin       <- toupper(coin)
  coin_pairs <- paste0(&quot;BTC_&quot;, coin[coin != &quot;BTC&quot;])
  if(add_bitcoin | return_in_USDT) coin_pairs <- c(&quot;USDT_BTC&quot;, coin_pairs)
  
  # loop over the coins to get the data
  for(i in coin_pairs){
    if(verbose)
      invisible(cat('\tGetting data for ', i, ' pair\n'))
    
    # this is a list that will contain the chart data for each coin pair
    try(chart_list[[i]] <- ReturnChartData(theObject = poloniex.public
                                       , pair      = i
                                       , from      = from
                                       , to        = to
                                       , period    = period)
        , silent = TRUE)
    
    # list to contain data.tables 
    try(dt_list[[i]] <- as.data.table(chart_list[[i]]), silent = TRUE)
  }
  
  # convert to data.table and make sure to add a column containing the pairs
  coin_dt <- rbindlist(l = dt_list, use.names = TRUE, idcol = &quot;pair&quot;)
  
  # return data in usdt prices
  if(return_in_USDT){
    # to get the price of the alt coin in usdt is not that simple but we'll do it
    # get a DT of the btc_usdt pair
    btc_usd <- coin_dt[pair == &quot;USDT_BTC&quot;]
    btc_usd <- btc_usd[, .(index, pair, weightedaverage)]
    setnames(btc_usd, c(&quot;Date&quot;, &quot;USDT_BTC_pair&quot;, &quot;USDT_BTC_price&quot;))
    
    # get DT with only alt coins
    alt_coins <- copy(coin_dt)#[pair != &quot;USDT_BTC&quot;]
    
    # now we need to add an index to the alt_coins table, but first we have to rename the index column
    alt_coins[, Date := index]
    alt_coins[, index := 1:.N]
    setkey(alt_coins, index)
    
    # now merge the data tables
    coin_dt_usdt <- merge(x = alt_coins, y = btc_usd, by = &quot;Date&quot;)
    
    # now calcualte the price in usdt
    coin_dt_usdt[, price_usdt := ifelse(pair == &quot;USDT_BTC&quot;, USDT_BTC_price, weightedaverage * USDT_BTC_price)]
    
    # now get rid of the extra columns
    coin_dt_usdt[, c(&quot;USDT_BTC_price&quot;, &quot;USDT_BTC_pair&quot;) := NULL]
    
    # we need to change some column names
    col_names_to_change <- c(&quot;pair&quot;, &quot;high&quot;, &quot;low&quot;, &quot;open&quot;, &quot;close&quot;, &quot;volume&quot;, &quot;quotevolume&quot;, &quot;weightedaverage&quot;)
    col_names <- names(coin_dt_usdt)
    col_names[col_names %in% col_names_to_change] <- paste0(col_names_to_change, '_btc')
    
    setnames(coin_dt_usdt, col_names)
    
    # add a column for the usdt pair
    coin_dt_usdt[, pair_usdt := gsub(&quot;BTC_&quot;, &quot;USDT_&quot;, pair_btc)]
    
    # adjust col order
    setcolorder(coin_dt_usdt, c(1:10, 12, 11))
    
    # set key again
    setkey(coin_dt_usdt, index)
    
    # now get rid of the index column since it is not needed anymore
    coin_dt_usdt[, index := NULL]
    
    # now put together the return list  
    return_list <- list(alt_chart_list = chart_list, alt_dt = coin_dt, alt_usdt_dt = coin_dt_usdt)
  }else{
    return_list <- list(alt_chart_list = chart_list, alt_dt = coin_dt)
  }
  
  return(return_list)
}" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki" style="background-color: #272822"><code><span class="line"><span style="color: #A6E22E">get_alt_data</span><span style="color: #F8F8F2"> </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">function</span><span style="color: #F8F8F2">(</span><span style="color: #FD971F">tz</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;UTC&quot;</span></span>
<span class="line"><span style="color: #F8F8F2">                         , </span><span style="color: #FD971F">coin</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">c</span><span style="color: #F8F8F2">(</span><span style="color: #E6DB74">&quot;ETH&quot;</span><span style="color: #F8F8F2">, </span><span style="color: #E6DB74">&quot;LTC&quot;</span><span style="color: #F8F8F2">)</span></span>
<span class="line"><span style="color: #F8F8F2">                         , </span><span style="color: #FD971F">add_bitcoin</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">TRUE</span></span>
<span class="line"><span style="color: #F8F8F2">                         , </span><span style="color: #FD971F">return_in_USDT</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">TRUE</span></span>
<span class="line"><span style="color: #F8F8F2">                         , </span><span style="color: #FD971F">from</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;2017-01-01&quot;</span></span>
<span class="line"><span style="color: #F8F8F2">                         , </span><span style="color: #FD971F">to</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;2018-04-09&quot;</span></span>
<span class="line"><span style="color: #F8F8F2">                         , </span><span style="color: #FD971F">period</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;D&quot;</span></span>
<span class="line"><span style="color: #F8F8F2">                         , </span><span style="color: #FD971F">verbose</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">FALSE</span><span style="color: #F8F8F2">){</span></span>
<span class="line"><span style="color: #F8F8F2">  </span></span>
<span class="line"><span style="color: #F8F8F2">  </span><span style="color: #88846F"># We will be using the public API</span></span>
<span class="line"><span style="color: #F8F8F2">  poloniex.public </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> PoloniexPublicAPI()</span></span>
<span class="line"><span style="color: #F8F8F2">  </span></span>
<span class="line"><span style="color: #F8F8F2">  </span><span style="color: #88846F"># set the time zone to utc</span></span>
<span class="line"><span style="color: #F8F8F2">  </span><span style="color: #66D9EF">Sys.setenv</span><span style="color: #F8F8F2">(</span><span style="color: #FD971F">tz</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> tz)</span></span>
<span class="line"><span style="color: #F8F8F2">  </span></span>
<span class="line"><span style="color: #F8F8F2">  </span><span style="color: #88846F"># convert from and to into time obj</span></span>
<span class="line"><span style="color: #F8F8F2">  from  </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">as.POSIXct</span><span style="color: #F8F8F2">(</span><span style="color: #66D9EF">paste</span><span style="color: #F8F8F2">(from, tz, </span><span style="color: #FD971F">sep</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;&quot;</span><span style="color: #F8F8F2">))</span></span>
<span class="line"><span style="color: #F8F8F2">  to    </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">as.POSIXct</span><span style="color: #F8F8F2">(</span><span style="color: #66D9EF">paste</span><span style="color: #F8F8F2">(to, tz, </span><span style="color: #FD971F">sep</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;&quot;</span><span style="color: #F8F8F2">))</span></span>
<span class="line"><span style="color: #F8F8F2">  </span></span>
<span class="line"><span style="color: #F8F8F2">  </span><span style="color: #88846F"># lists to store data.tables and xts objects</span></span>
<span class="line"><span style="color: #F8F8F2">  chart_list </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">list</span><span style="color: #F8F8F2">()</span></span>
<span class="line"><span style="color: #F8F8F2">  dt_list    </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">list</span><span style="color: #F8F8F2">()</span></span>
<span class="line"><span style="color: #F8F8F2">  </span></span>
<span class="line"><span style="color: #F8F8F2">  </span><span style="color: #88846F"># make sure the coin pair is in upper case</span></span>
<span class="line"><span style="color: #F8F8F2">  coin       </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">toupper</span><span style="color: #F8F8F2">(coin)</span></span>
<span class="line"><span style="color: #F8F8F2">  coin_pairs </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">paste0</span><span style="color: #F8F8F2">(</span><span style="color: #E6DB74">&quot;BTC_&quot;</span><span style="color: #F8F8F2">, coin[coin </span><span style="color: #F92672">!=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;BTC&quot;</span><span style="color: #F8F8F2">])</span></span>
<span class="line"><span style="color: #F8F8F2">  </span><span style="color: #F92672">if</span><span style="color: #F8F8F2">(add_bitcoin </span><span style="color: #F92672">|</span><span style="color: #F8F8F2"> return_in_USDT) coin_pairs </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">c</span><span style="color: #F8F8F2">(</span><span style="color: #E6DB74">&quot;USDT_BTC&quot;</span><span style="color: #F8F8F2">, coin_pairs)</span></span>
<span class="line"><span style="color: #F8F8F2">  </span></span>
<span class="line"><span style="color: #F8F8F2">  </span><span style="color: #88846F"># loop over the coins to get the data</span></span>
<span class="line"><span style="color: #F8F8F2">  </span><span style="color: #F92672">for</span><span style="color: #F8F8F2">(i </span><span style="color: #F92672">in</span><span style="color: #F8F8F2"> coin_pairs){</span></span>
<span class="line"><span style="color: #F8F8F2">    </span><span style="color: #F92672">if</span><span style="color: #F8F8F2">(verbose)</span></span>
<span class="line"><span style="color: #F8F8F2">      </span><span style="color: #F92672">invisible</span><span style="color: #F8F8F2">(</span><span style="color: #66D9EF">cat</span><span style="color: #F8F8F2">(</span><span style="color: #E6DB74">&#39;</span><span style="color: #AE81FF">\t</span><span style="color: #E6DB74">Getting data for &#39;</span><span style="color: #F8F8F2">, i, </span><span style="color: #E6DB74">&#39; pair</span><span style="color: #AE81FF">\n</span><span style="color: #E6DB74">&#39;</span><span style="color: #F8F8F2">))</span></span>
<span class="line"><span style="color: #F8F8F2">    </span></span>
<span class="line"><span style="color: #F8F8F2">    </span><span style="color: #88846F"># this is a list that will contain the chart data for each coin pair</span></span>
<span class="line"><span style="color: #F8F8F2">    </span><span style="color: #66D9EF">try</span><span style="color: #F8F8F2">(chart_list[[i]] </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> ReturnChartData(</span><span style="color: #FD971F">theObject</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> poloniex.public</span></span>
<span class="line"><span style="color: #F8F8F2">                                       , pair      = i</span></span>
<span class="line"><span style="color: #F8F8F2">                                       , from      = from</span></span>
<span class="line"><span style="color: #F8F8F2">                                       , to        = to</span></span>
<span class="line"><span style="color: #F8F8F2">                                       , period    = period)</span></span>
<span class="line"><span style="color: #F8F8F2">        , </span><span style="color: #FD971F">silent</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">TRUE</span><span style="color: #F8F8F2">)</span></span>
<span class="line"><span style="color: #F8F8F2">    </span></span>
<span class="line"><span style="color: #F8F8F2">    </span><span style="color: #88846F"># list to contain data.tables </span></span>
<span class="line"><span style="color: #F8F8F2">    </span><span style="color: #66D9EF">try</span><span style="color: #F8F8F2">(dt_list[[i]] </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> as.data.table(chart_list[[i]]), </span><span style="color: #FD971F">silent</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">TRUE</span><span style="color: #F8F8F2">)</span></span>
<span class="line"><span style="color: #F8F8F2">  }</span></span>
<span class="line"><span style="color: #F8F8F2">  </span></span>
<span class="line"><span style="color: #F8F8F2">  </span><span style="color: #88846F"># convert to data.table and make sure to add a column containing the pairs</span></span>
<span class="line"><span style="color: #F8F8F2">  coin_dt </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> rbindlist(</span><span style="color: #FD971F">l</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> dt_list, </span><span style="color: #FD971F">use.names</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">TRUE</span><span style="color: #F8F8F2">, </span><span style="color: #FD971F">idcol</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;pair&quot;</span><span style="color: #F8F8F2">)</span></span>
<span class="line"><span style="color: #F8F8F2">  </span></span>
<span class="line"><span style="color: #F8F8F2">  </span><span style="color: #88846F"># return data in usdt prices</span></span>
<span class="line"><span style="color: #F8F8F2">  </span><span style="color: #F92672">if</span><span style="color: #F8F8F2">(return_in_USDT){</span></span>
<span class="line"><span style="color: #F8F8F2">    </span><span style="color: #88846F"># to get the price of the alt coin in usdt is not that simple but we&#39;ll do it</span></span>
<span class="line"><span style="color: #F8F8F2">    </span><span style="color: #88846F"># get a DT of the btc_usdt pair</span></span>
<span class="line"><span style="color: #F8F8F2">    btc_usd </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> coin_dt[pair </span><span style="color: #F92672">==</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;USDT_BTC&quot;</span><span style="color: #F8F8F2">]</span></span>
<span class="line"><span style="color: #F8F8F2">    btc_usd </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> btc_usd[, .(index, pair, weightedaverage)]</span></span>
<span class="line"><span style="color: #F8F8F2">    setnames(btc_usd, </span><span style="color: #66D9EF">c</span><span style="color: #F8F8F2">(</span><span style="color: #E6DB74">&quot;Date&quot;</span><span style="color: #F8F8F2">, </span><span style="color: #E6DB74">&quot;USDT_BTC_pair&quot;</span><span style="color: #F8F8F2">, </span><span style="color: #E6DB74">&quot;USDT_BTC_price&quot;</span><span style="color: #F8F8F2">))</span></span>
<span class="line"><span style="color: #F8F8F2">    </span></span>
<span class="line"><span style="color: #F8F8F2">    </span><span style="color: #88846F"># get DT with only alt coins</span></span>
<span class="line"><span style="color: #F8F8F2">    alt_coins </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> copy(coin_dt)</span><span style="color: #88846F">#[pair != &quot;USDT_BTC&quot;]</span></span>
<span class="line"><span style="color: #F8F8F2">    </span></span>
<span class="line"><span style="color: #F8F8F2">    </span><span style="color: #88846F"># now we need to add an index to the alt_coins table, but first we have to rename the index column</span></span>
<span class="line"><span style="color: #F8F8F2">    alt_coins[, Date </span><span style="color: #F92672">:=</span><span style="color: #F8F8F2"> index]</span></span>
<span class="line"><span style="color: #F8F8F2">    alt_coins[, index </span><span style="color: #F92672">:=</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">1</span><span style="color: #F92672">:</span><span style="color: #F8F8F2">.N]</span></span>
<span class="line"><span style="color: #F8F8F2">    setkey(alt_coins, index)</span></span>
<span class="line"><span style="color: #F8F8F2">    </span></span>
<span class="line"><span style="color: #F8F8F2">    </span><span style="color: #88846F"># now merge the data tables</span></span>
<span class="line"><span style="color: #F8F8F2">    coin_dt_usdt </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">merge</span><span style="color: #F8F8F2">(</span><span style="color: #FD971F">x</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> alt_coins, </span><span style="color: #FD971F">y</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> btc_usd, </span><span style="color: #FD971F">by</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;Date&quot;</span><span style="color: #F8F8F2">)</span></span>
<span class="line"><span style="color: #F8F8F2">    </span></span>
<span class="line"><span style="color: #F8F8F2">    </span><span style="color: #88846F"># now calcualte the price in usdt</span></span>
<span class="line"><span style="color: #F8F8F2">    coin_dt_usdt[, price_usdt </span><span style="color: #F92672">:=</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">ifelse</span><span style="color: #F8F8F2">(pair </span><span style="color: #F92672">==</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;USDT_BTC&quot;</span><span style="color: #F8F8F2">, USDT_BTC_price, weightedaverage </span><span style="color: #F92672">*</span><span style="color: #F8F8F2"> USDT_BTC_price)]</span></span>
<span class="line"><span style="color: #F8F8F2">    </span></span>
<span class="line"><span style="color: #F8F8F2">    </span><span style="color: #88846F"># now get rid of the extra columns</span></span>
<span class="line"><span style="color: #F8F8F2">    coin_dt_usdt[, </span><span style="color: #66D9EF">c</span><span style="color: #F8F8F2">(</span><span style="color: #E6DB74">&quot;USDT_BTC_price&quot;</span><span style="color: #F8F8F2">, </span><span style="color: #E6DB74">&quot;USDT_BTC_pair&quot;</span><span style="color: #F8F8F2">) </span><span style="color: #F92672">:=</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">NULL</span><span style="color: #F8F8F2">]</span></span>
<span class="line"><span style="color: #F8F8F2">    </span></span>
<span class="line"><span style="color: #F8F8F2">    </span><span style="color: #88846F"># we need to change some column names</span></span>
<span class="line"><span style="color: #F8F8F2">    col_names_to_change </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">c</span><span style="color: #F8F8F2">(</span><span style="color: #E6DB74">&quot;pair&quot;</span><span style="color: #F8F8F2">, </span><span style="color: #E6DB74">&quot;high&quot;</span><span style="color: #F8F8F2">, </span><span style="color: #E6DB74">&quot;low&quot;</span><span style="color: #F8F8F2">, </span><span style="color: #E6DB74">&quot;open&quot;</span><span style="color: #F8F8F2">, </span><span style="color: #E6DB74">&quot;close&quot;</span><span style="color: #F8F8F2">, </span><span style="color: #E6DB74">&quot;volume&quot;</span><span style="color: #F8F8F2">, </span><span style="color: #E6DB74">&quot;quotevolume&quot;</span><span style="color: #F8F8F2">, </span><span style="color: #E6DB74">&quot;weightedaverage&quot;</span><span style="color: #F8F8F2">)</span></span>
<span class="line"><span style="color: #F8F8F2">    col_names </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">names</span><span style="color: #F8F8F2">(coin_dt_usdt)</span></span>
<span class="line"><span style="color: #F8F8F2">    col_names[col_names </span><span style="color: #F92672">%in%</span><span style="color: #F8F8F2"> col_names_to_change] </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">paste0</span><span style="color: #F8F8F2">(col_names_to_change, </span><span style="color: #E6DB74">&#39;_btc&#39;</span><span style="color: #F8F8F2">)</span></span>
<span class="line"><span style="color: #F8F8F2">    </span></span>
<span class="line"><span style="color: #F8F8F2">    setnames(coin_dt_usdt, col_names)</span></span>
<span class="line"><span style="color: #F8F8F2">    </span></span>
<span class="line"><span style="color: #F8F8F2">    </span><span style="color: #88846F"># add a column for the usdt pair</span></span>
<span class="line"><span style="color: #F8F8F2">    coin_dt_usdt[, pair_usdt </span><span style="color: #F92672">:=</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">gsub</span><span style="color: #F8F8F2">(</span><span style="color: #E6DB74">&quot;BTC_&quot;</span><span style="color: #F8F8F2">, </span><span style="color: #E6DB74">&quot;USDT_&quot;</span><span style="color: #F8F8F2">, pair_btc)]</span></span>
<span class="line"><span style="color: #F8F8F2">    </span></span>
<span class="line"><span style="color: #F8F8F2">    </span><span style="color: #88846F"># adjust col order</span></span>
<span class="line"><span style="color: #F8F8F2">    setcolorder(coin_dt_usdt, </span><span style="color: #66D9EF">c</span><span style="color: #F8F8F2">(</span><span style="color: #AE81FF">1</span><span style="color: #F92672">:</span><span style="color: #AE81FF">10</span><span style="color: #F8F8F2">, </span><span style="color: #AE81FF">12</span><span style="color: #F8F8F2">, </span><span style="color: #AE81FF">11</span><span style="color: #F8F8F2">))</span></span>
<span class="line"><span style="color: #F8F8F2">    </span></span>
<span class="line"><span style="color: #F8F8F2">    </span><span style="color: #88846F"># set key again</span></span>
<span class="line"><span style="color: #F8F8F2">    setkey(coin_dt_usdt, index)</span></span>
<span class="line"><span style="color: #F8F8F2">    </span></span>
<span class="line"><span style="color: #F8F8F2">    </span><span style="color: #88846F"># now get rid of the index column since it is not needed anymore</span></span>
<span class="line"><span style="color: #F8F8F2">    coin_dt_usdt[, index </span><span style="color: #F92672">:=</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">NULL</span><span style="color: #F8F8F2">]</span></span>
<span class="line"><span style="color: #F8F8F2">    </span></span>
<span class="line"><span style="color: #F8F8F2">    </span><span style="color: #88846F"># now put together the return list  </span></span>
<span class="line"><span style="color: #F8F8F2">    return_list </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">list</span><span style="color: #F8F8F2">(</span><span style="color: #FD971F">alt_chart_list</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> chart_list, </span><span style="color: #FD971F">alt_dt</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> coin_dt, </span><span style="color: #FD971F">alt_usdt_dt</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> coin_dt_usdt)</span></span>
<span class="line"><span style="color: #F8F8F2">  }</span><span style="color: #F92672">else</span><span style="color: #F8F8F2">{</span></span>
<span class="line"><span style="color: #F8F8F2">    return_list </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">list</span><span style="color: #F8F8F2">(</span><span style="color: #FD971F">alt_chart_list</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> chart_list, </span><span style="color: #FD971F">alt_dt</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> coin_dt)</span></span>
<span class="line"><span style="color: #F8F8F2">  }</span></span>
<span class="line"><span style="color: #F8F8F2">  </span></span>
<span class="line"><span style="color: #F8F8F2">  </span><span style="color: #F92672">return</span><span style="color: #F8F8F2">(return_list)</span></span>
<span class="line"><span style="color: #F8F8F2">}</span></span></code></pre></div>



<p>The function above can be used to download data for multiple coin at the same time. The function returns a data.table object with data for all coins in the function call. Even if the user doesn’t add bitcoin to the list of coins, the function adds bitcoin by default. This can be deactivated with the add_bitcoin argument. Here is an example</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" style="font-size:.875rem;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:7.6875px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="# get alt data for some coins
alt_data <- get_alt_data(return_in_USDT = T
                         , from = &quot;2015-01-01&quot;
                         , coin = c('ETH','XRP', 'BCH', 'LTC', 'NEO', 'XMR', 'DASH', 'XEM'))[['alt_usdt_dt']]" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki" style="background-color: #272822"><code><span class="line"><span style="color: #88846F"># get alt data for some coins</span></span>
<span class="line"><span style="color: #F8F8F2">alt_data </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> get_alt_data(</span><span style="color: #FD971F">return_in_USDT</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> T</span></span>
<span class="line"><span style="color: #F8F8F2">                         , </span><span style="color: #FD971F">from</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;2015-01-01&quot;</span></span>
<span class="line"><span style="color: #F8F8F2">                         , </span><span style="color: #FD971F">coin</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">c</span><span style="color: #F8F8F2">(</span><span style="color: #E6DB74">&#39;ETH&#39;</span><span style="color: #F8F8F2">,</span><span style="color: #E6DB74">&#39;XRP&#39;</span><span style="color: #F8F8F2">, </span><span style="color: #E6DB74">&#39;BCH&#39;</span><span style="color: #F8F8F2">, </span><span style="color: #E6DB74">&#39;LTC&#39;</span><span style="color: #F8F8F2">, </span><span style="color: #E6DB74">&#39;NEO&#39;</span><span style="color: #F8F8F2">, </span><span style="color: #E6DB74">&#39;XMR&#39;</span><span style="color: #F8F8F2">, </span><span style="color: #E6DB74">&#39;DASH&#39;</span><span style="color: #F8F8F2">, </span><span style="color: #E6DB74">&#39;XEM&#39;</span><span style="color: #F8F8F2">))[[</span><span style="color: #E6DB74">&#39;alt_usdt_dt&#39;</span><span style="color: #F8F8F2">]]</span></span></code></pre></div>



<p id="a8ab">Let’s look at the data we just downloaded</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" style="font-size:.875rem;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:7.704856872558594px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="head(alt_data)" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki" style="background-color: #272822"><code><span class="line"><span style="color: #66D9EF">head</span><span style="color: #F8F8F2">(alt_data)</span></span></code></pre></div>


<div class="wp-block-image">
<figure class="aligncenter size-full"><img loading="lazy" decoding="async" width="828" height="249" loading="lazy" src="https://analyticadss.com/wp-content/uploads/2022/12/1_yabOeX6Rqn8e6eDO1XzGFg.webp" alt="" class="wp-image-4908" srcset="https://analyticadss.com/wp-content/uploads/2022/12/1_yabOeX6Rqn8e6eDO1XzGFg.webp 828w, https://analyticadss.com/wp-content/uploads/2022/12/1_yabOeX6Rqn8e6eDO1XzGFg-500x150.webp 500w, https://analyticadss.com/wp-content/uploads/2022/12/1_yabOeX6Rqn8e6eDO1XzGFg-150x45.webp 150w, https://analyticadss.com/wp-content/uploads/2022/12/1_yabOeX6Rqn8e6eDO1XzGFg-768x231.webp 768w" sizes="auto, (max-width: 828px) 100vw, 828px" /></figure>
</div>


<p id="9a46">The table shows the date, OHLC, Volume, and weightedaverage price in BTC. It also shows the pair and we added the price in USD.</p>



<h2 class="wp-block-heading" id="c5e5">Bitcoin-Altcoins Correlations</h2>



<p id="dbac">Wheneven I look at the prices of the coins available on my <a href="https://www.coinbase.com/" target="_blank" rel="noreferrer noopener">coinbase</a> app I always get struck by the similarity of the price trends between the four coins available on coinbase: BTC, ETH, BCH, and LTC, see Figure below. So I thought it will be a good idea to explore the correlation in price trends between altcoins and bitcoin.</p>


<div class="wp-block-image">
<figure class="aligncenter size-large"><img loading="lazy" decoding="async" width="661" height="1024" loading="lazy" src="https://analyticadss.com/wp-content/uploads/2022/12/1_6a1iEbXQTe5L9tKq9xs1XA-661x1024.webp" alt="" class="wp-image-4909" srcset="https://analyticadss.com/wp-content/uploads/2022/12/1_6a1iEbXQTe5L9tKq9xs1XA-661x1024.webp 661w, https://analyticadss.com/wp-content/uploads/2022/12/1_6a1iEbXQTe5L9tKq9xs1XA-323x500.webp 323w, https://analyticadss.com/wp-content/uploads/2022/12/1_6a1iEbXQTe5L9tKq9xs1XA-97x150.webp 97w, https://analyticadss.com/wp-content/uploads/2022/12/1_6a1iEbXQTe5L9tKq9xs1XA-768x1190.webp 768w, https://analyticadss.com/wp-content/uploads/2022/12/1_6a1iEbXQTe5L9tKq9xs1XA.webp 786w" sizes="auto, (max-width: 661px) 100vw, 661px" /></figure>
</div>


<p id="47a1">Let’s look at price trends of the coins we just downloaded. To better see potential correlations I am going to only zoom in on 2018.</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" style="font-size:.875rem;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:7.70486307144165px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="p <- ggplot(alt_data[year(Date) == 2018], aes(x = Date, y =  price_usdt, col = pair_usdt)) + geom_line()
p <- p + facet_wrap(~pair_usdt, scales = &quot;free&quot;, ncol = 3) + theme_minimal() + theme(legend.position=&quot;none&quot;) + ylab(&quot;Price (USD)&quot;)
p" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki" style="background-color: #272822"><code><span class="line"><span style="color: #F8F8F2">p </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> ggplot(alt_data[year(Date) </span><span style="color: #F92672">==</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">2018</span><span style="color: #F8F8F2">], aes(</span><span style="color: #FD971F">x</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> Date, </span><span style="color: #FD971F">y</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2">  price_usdt, </span><span style="color: #FD971F">col</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> pair_usdt)) </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> geom_line()</span></span>
<span class="line"><span style="color: #F8F8F2">p </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> p </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> facet_wrap(</span><span style="color: #F92672">~</span><span style="color: #F8F8F2">pair_usdt, </span><span style="color: #FD971F">scales</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;free&quot;</span><span style="color: #F8F8F2">, </span><span style="color: #FD971F">ncol</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">3</span><span style="color: #F8F8F2">) </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> theme_minimal() </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> theme(</span><span style="color: #FD971F">legend.position</span><span style="color: #F92672">=</span><span style="color: #E6DB74">&quot;none&quot;</span><span style="color: #F8F8F2">) </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> ylab(</span><span style="color: #E6DB74">&quot;Price (USD)&quot;</span><span style="color: #F8F8F2">)</span></span>
<span class="line"><span style="color: #F8F8F2">p</span></span></code></pre></div>


<div class="wp-block-image">
<figure class="aligncenter size-full"><img loading="lazy" decoding="async" width="828" height="499" loading="lazy" src="https://analyticadss.com/wp-content/uploads/2022/12/1_lXmT2c5hQ6tik7sNfwvFPw.webp" alt="" class="wp-image-4910" srcset="https://analyticadss.com/wp-content/uploads/2022/12/1_lXmT2c5hQ6tik7sNfwvFPw.webp 828w, https://analyticadss.com/wp-content/uploads/2022/12/1_lXmT2c5hQ6tik7sNfwvFPw-500x301.webp 500w, https://analyticadss.com/wp-content/uploads/2022/12/1_lXmT2c5hQ6tik7sNfwvFPw-150x90.webp 150w, https://analyticadss.com/wp-content/uploads/2022/12/1_lXmT2c5hQ6tik7sNfwvFPw-768x463.webp 768w" sizes="auto, (max-width: 828px) 100vw, 828px" /><figcaption class="wp-element-caption">Prices of Bitcoin and other altcoins in 2018<br></figcaption></figure>
</div>


<p id="3bf6">The figure above shows that some coins seems to be more correlated with Bitcoin than others. The figure also shows that this variablity between Bitcoin and another coin varies over time. More on this below.</p>



<p id="73c0">Tyring to find correlations bewteen time series data using Pearson correlation coefficient or other metrics used with stationary data, time series is not a form of stationary data, can give misleading results. Similar trends in time series data can also be very misleading, a nice article on this topic can be found <a href="https://svds.com/avoiding-common-mistakes-with-time-series/" rel="noreferrer noopener" target="_blank">here</a>. And always remember that <strong>Correlation doesn’t guarantee Causation</strong></p>



<p id="0f96">Bottom line is the following, one has to be careful when cross-correlating time serice. In order to perform proper correlation analysis we need to add some new variables to our table.</p>



<h2 class="wp-block-heading" id="510f">Percentage Daily Change</h2>



<p id="0ed2">Percentage daily change calculates the price change of a coin over a period of a day. Let’s add that to the table. Notice that we are calcualting this variable using the USD price, and not the price in Bitcoin.</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" style="font-size:.875rem;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:7.704864501953125px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="# add daily price change
alt_data[, pct_change := Delt(price_usdt), by = pair_usdt]" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki" style="background-color: #272822"><code><span class="line"><span style="color: #88846F"># add daily price change</span></span>
<span class="line"><span style="color: #F8F8F2">alt_data[, pct_change </span><span style="color: #F92672">:=</span><span style="color: #F8F8F2"> Delt(price_usdt), </span><span style="color: #FD971F">by</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> pair_usdt]</span></span></code></pre></div>



<h2 class="wp-block-heading" id="a775">Normalized Price in USD</h2>



<p id="a42d">Since the prices vary a lot, both overtime for the same coin and between coins, we will add a variable of the normalized price in USD. This variable will make it easy to plot prices of coins on the same figure.</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" style="font-size:.875rem;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:7.70489501953125px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="# add normalized prices in udst
alt_data[, price_usdt_norm := price_usdt/max(price_usdt), by = pair_usdt]" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki" style="background-color: #272822"><code><span class="line"><span style="color: #88846F"># add normalized prices in udst</span></span>
<span class="line"><span style="color: #F8F8F2">alt_data[, price_usdt_norm </span><span style="color: #F92672">:=</span><span style="color: #F8F8F2"> price_usdt</span><span style="color: #F92672">/</span><span style="color: #66D9EF">max</span><span style="color: #F8F8F2">(price_usdt), </span><span style="color: #FD971F">by</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> pair_usdt]</span></span></code></pre></div>



<p id="9cec">Now that we have the normalized prices in USD, let’s look at the prices of bitcoin and litcoin on the same figure. We’ll do that for 2018 so we can better see any possible correlations.</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" style="font-size:.875rem;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:7.704864501953125px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="p <- ggplot(alt_data[year(Date) == 2018 & pair_usdt %like% &quot;BTC|LTC&quot;], aes(x = Date, y =  price_usdt_norm, col = pair_usdt)) + geom_line()
p <- p + theme_minimal() + ylab(&quot;Price (USD)&quot;)
ggplotly(p)" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki" style="background-color: #272822"><code><span class="line"><span style="color: #F8F8F2">p </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> ggplot(alt_data[year(Date) </span><span style="color: #F92672">==</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">2018</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">&</span><span style="color: #F8F8F2"> pair_usdt </span><span style="color: #F92672">%like%</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;BTC|LTC&quot;</span><span style="color: #F8F8F2">], aes(</span><span style="color: #FD971F">x</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> Date, </span><span style="color: #FD971F">y</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2">  price_usdt_norm, </span><span style="color: #FD971F">col</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> pair_usdt)) </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> geom_line()</span></span>
<span class="line"><span style="color: #F8F8F2">p </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> p </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> theme_minimal() </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> ylab(</span><span style="color: #E6DB74">&quot;Price (USD)&quot;</span><span style="color: #F8F8F2">)</span></span>
<span class="line"><span style="color: #F8F8F2">ggplotly(p)</span></span></code></pre></div>


<div class="wp-block-image">
<figure class="aligncenter size-full"><img loading="lazy" decoding="async" width="828" height="483" loading="lazy" src="https://analyticadss.com/wp-content/uploads/2022/12/1_HHs59LsNRYT3sbL8I0Sdmg.webp" alt="" class="wp-image-4913" srcset="https://analyticadss.com/wp-content/uploads/2022/12/1_HHs59LsNRYT3sbL8I0Sdmg.webp 828w, https://analyticadss.com/wp-content/uploads/2022/12/1_HHs59LsNRYT3sbL8I0Sdmg-500x292.webp 500w, https://analyticadss.com/wp-content/uploads/2022/12/1_HHs59LsNRYT3sbL8I0Sdmg-150x88.webp 150w, https://analyticadss.com/wp-content/uploads/2022/12/1_HHs59LsNRYT3sbL8I0Sdmg-768x448.webp 768w" sizes="auto, (max-width: 828px) 100vw, 828px" /></figure>
</div>


<p>The trends in the prices of BTC and LTC are very similar, Let’s look at price trends for 2017</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" style="font-size:.875rem;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:7.704864501953125px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="p <- ggplot(alt_data[year(Date) == 2017 & pair_usdt %like% &quot;BTC|LTC&quot;], aes(x = Date, y =  price_usdt_norm, col = pair_usdt)) + geom_line()
p <- p + theme_minimal() + ylab(&quot;Price (USD)&quot;)
ggplotly(p)" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki" style="background-color: #272822"><code><span class="line"><span style="color: #F8F8F2">p </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> ggplot(alt_data[year(Date) </span><span style="color: #F92672">==</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">2017</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">&</span><span style="color: #F8F8F2"> pair_usdt </span><span style="color: #F92672">%like%</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;BTC|LTC&quot;</span><span style="color: #F8F8F2">], aes(</span><span style="color: #FD971F">x</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> Date, </span><span style="color: #FD971F">y</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2">  price_usdt_norm, </span><span style="color: #FD971F">col</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> pair_usdt)) </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> geom_line()</span></span>
<span class="line"><span style="color: #F8F8F2">p </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> p </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> theme_minimal() </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> ylab(</span><span style="color: #E6DB74">&quot;Price (USD)&quot;</span><span style="color: #F8F8F2">)</span></span>
<span class="line"><span style="color: #F8F8F2">ggplotly(p)</span></span></code></pre></div>


<div class="wp-block-image">
<figure class="aligncenter size-full"><img loading="lazy" decoding="async" width="828" height="480" loading="lazy" src="https://analyticadss.com/wp-content/uploads/2022/12/1_6t9j_F4QaweBgp_Wl7Ntwg.webp" alt="" class="wp-image-4911" srcset="https://analyticadss.com/wp-content/uploads/2022/12/1_6t9j_F4QaweBgp_Wl7Ntwg.webp 828w, https://analyticadss.com/wp-content/uploads/2022/12/1_6t9j_F4QaweBgp_Wl7Ntwg-500x290.webp 500w, https://analyticadss.com/wp-content/uploads/2022/12/1_6t9j_F4QaweBgp_Wl7Ntwg-150x87.webp 150w, https://analyticadss.com/wp-content/uploads/2022/12/1_6t9j_F4QaweBgp_Wl7Ntwg-768x445.webp 768w" sizes="auto, (max-width: 828px) 100vw, 828px" /><figcaption class="wp-element-caption">Prices of Bitcoin and LTC in 2017</figcaption></figure>
</div>


<p id="c49a">Seems like we need to zoon in on the last quarter of 2017, let’s do that</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" style="font-size:.875rem;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:7.704862594604492px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="p <- ggplot(alt_data[Date >= &quot;2017-10-01&quot;  & Date < &quot;2018-01-01&quot; &  pair_usdt %like% &quot;BTC|LTC&quot;], aes(x = Date, y =  price_usdt_norm, col = pair_usdt)) + geom_line()
p <- p + theme_minimal() + ylab(&quot;Price (USD)&quot;)
ggplotly(p)
" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki" style="background-color: #272822"><code><span class="line"><span style="color: #F8F8F2">p </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> ggplot(alt_data[Date </span><span style="color: #F92672">>=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;2017-10-01&quot;</span><span style="color: #F8F8F2">  </span><span style="color: #F92672">&</span><span style="color: #F8F8F2"> Date </span><span style="color: #F92672"><</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;2018-01-01&quot;</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">&</span><span style="color: #F8F8F2">  pair_usdt </span><span style="color: #F92672">%like%</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;BTC|LTC&quot;</span><span style="color: #F8F8F2">], aes(</span><span style="color: #FD971F">x</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> Date, </span><span style="color: #FD971F">y</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2">  price_usdt_norm, </span><span style="color: #FD971F">col</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> pair_usdt)) </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> geom_line()</span></span>
<span class="line"><span style="color: #F8F8F2">p </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> p </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> theme_minimal() </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> ylab(</span><span style="color: #E6DB74">&quot;Price (USD)&quot;</span><span style="color: #F8F8F2">)</span></span>
<span class="line"><span style="color: #F8F8F2">ggplotly(p)</span></span>
<span class="line"></span></code></pre></div>


<div class="wp-block-image">
<figure class="aligncenter size-full"><img loading="lazy" decoding="async" width="828" height="490" loading="lazy" src="https://analyticadss.com/wp-content/uploads/2022/12/1_yEZOrUPaK-ZhizMdmeFCmA.webp" alt="" class="wp-image-4914" srcset="https://analyticadss.com/wp-content/uploads/2022/12/1_yEZOrUPaK-ZhizMdmeFCmA.webp 828w, https://analyticadss.com/wp-content/uploads/2022/12/1_yEZOrUPaK-ZhizMdmeFCmA-500x296.webp 500w, https://analyticadss.com/wp-content/uploads/2022/12/1_yEZOrUPaK-ZhizMdmeFCmA-150x89.webp 150w, https://analyticadss.com/wp-content/uploads/2022/12/1_yEZOrUPaK-ZhizMdmeFCmA-768x454.webp 768w" sizes="auto, (max-width: 828px) 100vw, 828px" /><figcaption class="wp-element-caption">Prices of Bitcoin and LTC in the last quarter of 2017<br></figcaption></figure>
</div>


<p id="fcd8">It is clear from the above figure that the correlation in the prices of bitcoin and LTC vary over time. Note how the highest price for bitcoin on December 17 2017, preceded that of LTC by two days, which occurred on December 19 2017. This wasn’t the case for the ATH which occurred on January 6th 2018 for both coins.</p>



<h2 class="wp-block-heading" id="6009">Static Correlations </h2>



<h3 class="wp-block-heading" id="6009">(and why you shouldn’t use them with crypto!)</h3>



<p id="9035">Up until now I haven’t calculated any correltaions between the price of different coins. You might ask why should we even care about correlations in time series. Well, in the case of financial time series data, if one can show that a correlation exists between two time series then one can use this correlation to model/predict the price movement of one coin/stock given the price trends of another coin/stock. </p>



<p id="9035"><strong>However</strong>, as we mentioned earlier, correlation for time series data is not static, it changes over time. Actually let’s show that. To do that I am going to be calculating the <a href="https://en.wikipedia.org/wiki/Pearson_correlation_coefficient" target="_blank" rel="noreferrer noopener">Pearson correlation coefficient</a>. In simple words, Pearson correlation coefficient for two vectors of data is a measure that shows how correlated these two vectors of data are. The value of this coefficient varies from -1, perfectly anti-correlated, to 1, perfectly correlated. So the correlation coefficient for a series of numbers on itself is 1. A value of zero means these is no correlation. Remember, this only works for static data.</p>



<p id="62e8">In order to perform correlation on our data I am going to need to do some data transformation:</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" style="font-size:.875rem;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:7.70489501953125px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="# subset data, only keep the date, the pair, and the price
alt_data_sub <- alt_data[, .(Date, pair_usdt, price_usdt)]
# convert to wide format 
alt_data_sub <- spread(data = alt_data_sub, key = &quot;pair_usdt&quot;, value = &quot;price_usdt&quot;)
# clean column names
setnames(alt_data_sub, gsub(&quot;USDT_&quot;, &quot;&quot;, colnames(alt_data_sub)))" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki" style="background-color: #272822"><code><span class="line"><span style="color: #88846F"># subset data, only keep the date, the pair, and the price</span></span>
<span class="line"><span style="color: #F8F8F2">alt_data_sub </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> alt_data[, .(Date, pair_usdt, price_usdt)]</span></span>
<span class="line"><span style="color: #88846F"># convert to wide format </span></span>
<span class="line"><span style="color: #F8F8F2">alt_data_sub </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> spread(</span><span style="color: #FD971F">data</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> alt_data_sub, </span><span style="color: #FD971F">key</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;pair_usdt&quot;</span><span style="color: #F8F8F2">, </span><span style="color: #FD971F">value</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;price_usdt&quot;</span><span style="color: #F8F8F2">)</span></span>
<span class="line"><span style="color: #88846F"># clean column names</span></span>
<span class="line"><span style="color: #F8F8F2">setnames(alt_data_sub, </span><span style="color: #66D9EF">gsub</span><span style="color: #F8F8F2">(</span><span style="color: #E6DB74">&quot;USDT_&quot;</span><span style="color: #F8F8F2">, </span><span style="color: #E6DB74">&quot;&quot;</span><span style="color: #F8F8F2">, </span><span style="color: #66D9EF">colnames</span><span style="color: #F8F8F2">(alt_data_sub)))</span></span></code></pre></div>



<p>The new table we created contains the date along with the prices in USDT for each coin we have in our table.</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" style="font-size:.875rem;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:7.7048492431640625px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="tail(alt_data_sub)" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki" style="background-color: #272822"><code><span class="line"><span style="color: #66D9EF">tail</span><span style="color: #F8F8F2">(alt_data_sub)</span></span></code></pre></div>


<div class="wp-block-image">
<figure class="aligncenter size-full"><img loading="lazy" decoding="async" width="828" height="236" loading="lazy" src="https://analyticadss.com/wp-content/uploads/2022/12/1_3onSqsAzx-A1jNaprt9BMw.webp" alt="" class="wp-image-4915" srcset="https://analyticadss.com/wp-content/uploads/2022/12/1_3onSqsAzx-A1jNaprt9BMw.webp 828w, https://analyticadss.com/wp-content/uploads/2022/12/1_3onSqsAzx-A1jNaprt9BMw-500x143.webp 500w, https://analyticadss.com/wp-content/uploads/2022/12/1_3onSqsAzx-A1jNaprt9BMw-150x43.webp 150w, https://analyticadss.com/wp-content/uploads/2022/12/1_3onSqsAzx-A1jNaprt9BMw-768x219.webp 768w" sizes="auto, (max-width: 828px) 100vw, 828px" /></figure>
</div>


<p>Again, what I am doing here is not correct, I am just trying to show you why we shouldn’t be doing static correlations on crypto data. Now we’ll calculate the Pearson correlation coefficient between the coins we have, then we are going to make a nice plot of these coefficients.</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" style="font-size:.875rem;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:7.704833984375px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="# calculate the correlation matrix
M <- cor(alt_data_sub[, -1], use = &quot;complete.obs&quot;) # notice how we are ignoring missing data with the last argument
# plot the correlation matrix
corrplot.mixed(corr = M, upper = &quot;ellipse&quot;, lower = &quot;number&quot;, order = &quot;AOE&quot;, tl.col = &quot;black&quot;)" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki" style="background-color: #272822"><code><span class="line"><span style="color: #88846F"># calculate the correlation matrix</span></span>
<span class="line"><span style="color: #F8F8F2">M </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">cor</span><span style="color: #F8F8F2">(alt_data_sub[, </span><span style="color: #F92672">-</span><span style="color: #AE81FF">1</span><span style="color: #F8F8F2">], </span><span style="color: #FD971F">use</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;complete.obs&quot;</span><span style="color: #F8F8F2">) </span><span style="color: #88846F"># notice how we are ignoring missing data with the last argument</span></span>
<span class="line"><span style="color: #88846F"># plot the correlation matrix</span></span>
<span class="line"><span style="color: #F8F8F2">corrplot.mixed(</span><span style="color: #FD971F">corr</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> M, </span><span style="color: #FD971F">upper</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;ellipse&quot;</span><span style="color: #F8F8F2">, </span><span style="color: #FD971F">lower</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;number&quot;</span><span style="color: #F8F8F2">, </span><span style="color: #FD971F">order</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;AOE&quot;</span><span style="color: #F8F8F2">, </span><span style="color: #FD971F">tl.col</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;black&quot;</span><span style="color: #F8F8F2">)</span></span></code></pre></div>


<div class="wp-block-image">
<figure class="aligncenter size-full"><img loading="lazy" decoding="async" width="826" height="722" loading="lazy" src="https://analyticadss.com/wp-content/uploads/2022/12/1_mjAaFbtDf5GeYp6RCppeSA.webp" alt="" class="wp-image-4916" srcset="https://analyticadss.com/wp-content/uploads/2022/12/1_mjAaFbtDf5GeYp6RCppeSA.webp 826w, https://analyticadss.com/wp-content/uploads/2022/12/1_mjAaFbtDf5GeYp6RCppeSA-500x437.webp 500w, https://analyticadss.com/wp-content/uploads/2022/12/1_mjAaFbtDf5GeYp6RCppeSA-150x131.webp 150w, https://analyticadss.com/wp-content/uploads/2022/12/1_mjAaFbtDf5GeYp6RCppeSA-768x671.webp 768w" sizes="auto, (max-width: 826px) 100vw, 826px" /></figure>
</div>


<p id="99ac">The figure above shows the correlation coefficients between the different coins. It is easy to read, visually, the darker the color of the ellipse, and the more diagonal the ellipse, the higher the correlation coefficient. Of course you can also just look at the numbers on the bottom left part of the figure to get the value of the coefficient between two coins :). The figure shows how highly correlated the prices of crypto currencies can be. For example XRP and XEM have a correlation coefficient of 0.93. The highest correlation seems to be between BCH and DASH at 0.97 correlation coefficient.</p>



<p id="bcf4">All of the correlation coefficient we see in the above figure are significant, the question is, do these correlations vary over time. To answer this question I will calculate the correlation coefficient between Bitcoin and DASH on a monthly basis, you can do that for any time period, and will show that this coefficient varies greatly over time. Let’s do that</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" style="font-size:.875rem;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:15.395828247070312px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="# subset the data
btc_dash <- alt_data_sub[, .(Date, BTC, DASH)]
# add a year_month column
btc_dash[, year_month := as.yearmon(Date)]
# calculate the correlation coefficient on montly basis
btc_dash_2 <- btc_dash[, cor(BTC, DASH), by = year_month]
# now plot the correlation coefficient as a function of month and year
plot(btc_dash_2$year_month, btc_dash_2$V1, xlab = &quot;Year-Month&quot;, main = &quot;Correlation Coeff. Between BTC and DASH Over time&quot;
     , ylab = &quot;Correlation Coefficient&quot;, type = &quot;b&quot;, pch = 19, col = ifelse(btc_dash_2$V1 > 0, &quot;blue&quot;, &quot;red&quot;)
     , ylim = c(-1, 1))" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki" style="background-color: #272822"><code><span class="line"><span style="color: #88846F"># subset the data</span></span>
<span class="line"><span style="color: #F8F8F2">btc_dash </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> alt_data_sub[, .(Date, BTC, DASH)]</span></span>
<span class="line"><span style="color: #88846F"># add a year_month column</span></span>
<span class="line"><span style="color: #F8F8F2">btc_dash[, year_month </span><span style="color: #F92672">:=</span><span style="color: #F8F8F2"> as.yearmon(Date)]</span></span>
<span class="line"><span style="color: #88846F"># calculate the correlation coefficient on montly basis</span></span>
<span class="line"><span style="color: #F8F8F2">btc_dash_2 </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> btc_dash[, </span><span style="color: #66D9EF">cor</span><span style="color: #F8F8F2">(BTC, DASH), </span><span style="color: #FD971F">by</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> year_month]</span></span>
<span class="line"><span style="color: #88846F"># now plot the correlation coefficient as a function of month and year</span></span>
<span class="line"><span style="color: #66D9EF">plot</span><span style="color: #F8F8F2">(btc_dash_2</span><span style="color: #F92672">$</span><span style="color: #F8F8F2">year_month, btc_dash_2</span><span style="color: #F92672">$</span><span style="color: #F8F8F2">V1, </span><span style="color: #FD971F">xlab</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;Year-Month&quot;</span><span style="color: #F8F8F2">, </span><span style="color: #FD971F">main</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;Correlation Coeff. Between BTC and DASH Over time&quot;</span></span>
<span class="line"><span style="color: #F8F8F2">     , </span><span style="color: #FD971F">ylab</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;Correlation Coefficient&quot;</span><span style="color: #F8F8F2">, </span><span style="color: #FD971F">type</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;b&quot;</span><span style="color: #F8F8F2">, </span><span style="color: #FD971F">pch</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">19</span><span style="color: #F8F8F2">, </span><span style="color: #FD971F">col</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">ifelse</span><span style="color: #F8F8F2">(btc_dash_2</span><span style="color: #F92672">$</span><span style="color: #F8F8F2">V1 </span><span style="color: #F92672">></span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">0</span><span style="color: #F8F8F2">, </span><span style="color: #E6DB74">&quot;blue&quot;</span><span style="color: #F8F8F2">, </span><span style="color: #E6DB74">&quot;red&quot;</span><span style="color: #F8F8F2">)</span></span>
<span class="line"><span style="color: #F8F8F2">     , </span><span style="color: #FD971F">ylim</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">c</span><span style="color: #F8F8F2">(</span><span style="color: #F92672">-</span><span style="color: #AE81FF">1</span><span style="color: #F8F8F2">, </span><span style="color: #AE81FF">1</span><span style="color: #F8F8F2">))</span></span></code></pre></div>



<p id="094a">This is interesting, the value of the monthly correlation coefficient between bitcoin and DASH varies between <strong>-0.91</strong>, highly anti-correlated, to <strong>0.98</strong>, highly correlated. And this is why <strong><em>you should never use static correlation metrics with crypto data!</em></strong></p>



<p id="78e2">A good blog post on this same topic is written by <a href="https://twitter.com/tomeff" rel="noreferrer noopener" target="_blank">Tom Fawcett</a> from <a href="https://www.svds.com/" rel="noreferrer noopener" target="_blank">Silicon Valley Data Science</a> and can be found <a href="https://www.svds.com/avoiding-common-mistakes-with-time-series/" rel="noreferrer noopener" target="_blank">here</a>. In his post Tom shows, with simple simulations, why static correlations should never be used with time series.</p>



<h3 class="wp-block-heading" id="2441">Correlation Networks</h3>



<p id="a16e">There is one more plot I would like to make, which is a network plot of the correlations between the different coins. The correlation network plot helps show strengths of correlation between the different coins. Agian, these correlations are time dependent and the figure we will be making will change over time, but I still think it is a good figure to make. Here it is:</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" style="font-size:.875rem;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:15.395835876464844px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="# we will be using the great corrr package for this work
# get the correlation matrix, just like we did before
# build the correlation matrix 
# the code snippets below are taken from, that is a great blog BTW 
# http://www.business-science.io/timeseries-analysis/2017/07/30/tidy-timeseries-analysis-pt-3.html
corr_2 <- correlate(alt_data_sub[, -1])
# make the network plot
# Network plot
corr_net <- corr_2 %>%
  network_plot(colours = c(palette_light()[[2]], &quot;white&quot;, palette_light()[[4]]), legend = TRUE) +
  labs(
    title = &quot;Static Correlations of some Crypto Currencies&quot;,
    subtitle = &quot;2014 through 2018&quot;
  ) +
  theme_tq() +
  theme(legend.position = &quot;bottom&quot;)
corr_net" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki" style="background-color: #272822"><code><span class="line"><span style="color: #88846F"># we will be using the great corrr package for this work</span></span>
<span class="line"><span style="color: #88846F"># get the correlation matrix, just like we did before</span></span>
<span class="line"><span style="color: #88846F"># build the correlation matrix </span></span>
<span class="line"><span style="color: #88846F"># the code snippets below are taken from, that is a great blog BTW </span></span>
<span class="line"><span style="color: #88846F"># http://www.business-science.io/timeseries-analysis/2017/07/30/tidy-timeseries-analysis-pt-3.html</span></span>
<span class="line"><span style="color: #F8F8F2">corr_2 </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> correlate(alt_data_sub[, </span><span style="color: #F92672">-</span><span style="color: #AE81FF">1</span><span style="color: #F8F8F2">])</span></span>
<span class="line"><span style="color: #88846F"># make the network plot</span></span>
<span class="line"><span style="color: #88846F"># Network plot</span></span>
<span class="line"><span style="color: #F8F8F2">corr_net </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> corr_2 </span><span style="color: #F92672">%>%</span></span>
<span class="line"><span style="color: #F8F8F2">  network_plot(</span><span style="color: #FD971F">colours</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">c</span><span style="color: #F8F8F2">(palette_light()[[</span><span style="color: #AE81FF">2</span><span style="color: #F8F8F2">]], </span><span style="color: #E6DB74">&quot;white&quot;</span><span style="color: #F8F8F2">, palette_light()[[</span><span style="color: #AE81FF">4</span><span style="color: #F8F8F2">]]), </span><span style="color: #FD971F">legend</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">TRUE</span><span style="color: #F8F8F2">) </span><span style="color: #F92672">+</span></span>
<span class="line"><span style="color: #F8F8F2">  labs(</span></span>
<span class="line"><span style="color: #F8F8F2">    </span><span style="color: #FD971F">title</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;Static Correlations of some Crypto Currencies&quot;</span><span style="color: #F8F8F2">,</span></span>
<span class="line"><span style="color: #F8F8F2">    </span><span style="color: #FD971F">subtitle</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;2014 through 2018&quot;</span></span>
<span class="line"><span style="color: #F8F8F2">  ) </span><span style="color: #F92672">+</span></span>
<span class="line"><span style="color: #F8F8F2">  theme_tq() </span><span style="color: #F92672">+</span></span>
<span class="line"><span style="color: #F8F8F2">  theme(</span><span style="color: #FD971F">legend.position</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;bottom&quot;</span><span style="color: #F8F8F2">)</span></span>
<span class="line"><span style="color: #F8F8F2">corr_net</span></span></code></pre></div>


<div class="wp-block-image">
<figure class="aligncenter size-full"><img loading="lazy" decoding="async" width="828" height="519" loading="lazy" src="https://analyticadss.com/wp-content/uploads/2022/12/1_brRhP1L95H00I6m3TcD-2A.webp" alt="" class="wp-image-4917" srcset="https://analyticadss.com/wp-content/uploads/2022/12/1_brRhP1L95H00I6m3TcD-2A.webp 828w, https://analyticadss.com/wp-content/uploads/2022/12/1_brRhP1L95H00I6m3TcD-2A-500x313.webp 500w, https://analyticadss.com/wp-content/uploads/2022/12/1_brRhP1L95H00I6m3TcD-2A-150x94.webp 150w, https://analyticadss.com/wp-content/uploads/2022/12/1_brRhP1L95H00I6m3TcD-2A-768x481.webp 768w" sizes="auto, (max-width: 828px) 100vw, 828px" /></figure>
</div>


<p id="9225">The figure above shows a network which measures how strongly correlated the prices of the coins under stugy are. The darker the color of the edge, line, connecting two coins and the closer they are in the network the stronger the correlation between these two coins.</p>



<p id="851b">From the figure, it seems like XMR, LTC, and BTC are in the heart of this network, while BCH seems to be the least correlated with the rest of the coins. Let’s see how the network plot changes between 2017 and 2018:</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" style="font-size:.875rem;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:15.395835876464844px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="# subset the data and get correlation matrices
corr_2017 <- correlate(alt_data_sub[year(Date) == 2017][, -1])
corr_2018 <- correlate(alt_data_sub[year(Date) == 2018][, -1])
# build Network plots
corr_net_2017 <- corr_2017 %>%
  network_plot(colours = c(palette_light()[[2]], &quot;white&quot;, palette_light()[[4]]), legend = TRUE) +
  labs(
    title = &quot;Static Correlations of some Crypto Currencies&quot;,
    subtitle = &quot;2017&quot;
  ) +
  theme_tq() +
  theme(legend.position = &quot;bottom&quot;)
corr_net_2018 <- corr_2018 %>%
  network_plot(colours = c(palette_light()[[2]], &quot;white&quot;, palette_light()[[4]]), legend = TRUE) +
  labs(
    title = &quot;Static Correlations of some Crypto Currencies&quot;,
    subtitle = &quot;2018&quot;
  ) +
  theme_tq() +
  theme(legend.position = &quot;bottom&quot;)
# combine network plots
cow_net_plots <-plot_grid(corr_net_2017, corr_net_2018, ncol = 2)
title <- ggdraw() + 
    draw_label(label = 'Crypto Correlation Networks',
               fontface = 'bold', size = 18)
cow_out <- plot_grid(title, cow_net_plots, ncol=1, rel_heights=c(0.1, 1))
cow_out" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki" style="background-color: #272822"><code><span class="line"><span style="color: #88846F"># subset the data and get correlation matrices</span></span>
<span class="line"><span style="color: #F8F8F2">corr_2017 </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> correlate(alt_data_sub[year(Date) </span><span style="color: #F92672">==</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">2017</span><span style="color: #F8F8F2">][, </span><span style="color: #F92672">-</span><span style="color: #AE81FF">1</span><span style="color: #F8F8F2">])</span></span>
<span class="line"><span style="color: #F8F8F2">corr_2018 </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> correlate(alt_data_sub[year(Date) </span><span style="color: #F92672">==</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">2018</span><span style="color: #F8F8F2">][, </span><span style="color: #F92672">-</span><span style="color: #AE81FF">1</span><span style="color: #F8F8F2">])</span></span>
<span class="line"><span style="color: #88846F"># build Network plots</span></span>
<span class="line"><span style="color: #F8F8F2">corr_net_2017 </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> corr_2017 </span><span style="color: #F92672">%>%</span></span>
<span class="line"><span style="color: #F8F8F2">  network_plot(</span><span style="color: #FD971F">colours</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">c</span><span style="color: #F8F8F2">(palette_light()[[</span><span style="color: #AE81FF">2</span><span style="color: #F8F8F2">]], </span><span style="color: #E6DB74">&quot;white&quot;</span><span style="color: #F8F8F2">, palette_light()[[</span><span style="color: #AE81FF">4</span><span style="color: #F8F8F2">]]), </span><span style="color: #FD971F">legend</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">TRUE</span><span style="color: #F8F8F2">) </span><span style="color: #F92672">+</span></span>
<span class="line"><span style="color: #F8F8F2">  labs(</span></span>
<span class="line"><span style="color: #F8F8F2">    </span><span style="color: #FD971F">title</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;Static Correlations of some Crypto Currencies&quot;</span><span style="color: #F8F8F2">,</span></span>
<span class="line"><span style="color: #F8F8F2">    </span><span style="color: #FD971F">subtitle</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;2017&quot;</span></span>
<span class="line"><span style="color: #F8F8F2">  ) </span><span style="color: #F92672">+</span></span>
<span class="line"><span style="color: #F8F8F2">  theme_tq() </span><span style="color: #F92672">+</span></span>
<span class="line"><span style="color: #F8F8F2">  theme(</span><span style="color: #FD971F">legend.position</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;bottom&quot;</span><span style="color: #F8F8F2">)</span></span>
<span class="line"><span style="color: #F8F8F2">corr_net_2018 </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> corr_2018 </span><span style="color: #F92672">%>%</span></span>
<span class="line"><span style="color: #F8F8F2">  network_plot(</span><span style="color: #FD971F">colours</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">c</span><span style="color: #F8F8F2">(palette_light()[[</span><span style="color: #AE81FF">2</span><span style="color: #F8F8F2">]], </span><span style="color: #E6DB74">&quot;white&quot;</span><span style="color: #F8F8F2">, palette_light()[[</span><span style="color: #AE81FF">4</span><span style="color: #F8F8F2">]]), </span><span style="color: #FD971F">legend</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">TRUE</span><span style="color: #F8F8F2">) </span><span style="color: #F92672">+</span></span>
<span class="line"><span style="color: #F8F8F2">  labs(</span></span>
<span class="line"><span style="color: #F8F8F2">    </span><span style="color: #FD971F">title</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;Static Correlations of some Crypto Currencies&quot;</span><span style="color: #F8F8F2">,</span></span>
<span class="line"><span style="color: #F8F8F2">    </span><span style="color: #FD971F">subtitle</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;2018&quot;</span></span>
<span class="line"><span style="color: #F8F8F2">  ) </span><span style="color: #F92672">+</span></span>
<span class="line"><span style="color: #F8F8F2">  theme_tq() </span><span style="color: #F92672">+</span></span>
<span class="line"><span style="color: #F8F8F2">  theme(</span><span style="color: #FD971F">legend.position</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;bottom&quot;</span><span style="color: #F8F8F2">)</span></span>
<span class="line"><span style="color: #88846F"># combine network plots</span></span>
<span class="line"><span style="color: #F8F8F2">cow_net_plots </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2">plot_grid(corr_net_2017, corr_net_2018, </span><span style="color: #FD971F">ncol</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">2</span><span style="color: #F8F8F2">)</span></span>
<span class="line"><span style="color: #F8F8F2">title </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> ggdraw() </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> </span></span>
<span class="line"><span style="color: #F8F8F2">    draw_label(</span><span style="color: #FD971F">label</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&#39;Crypto Correlation Networks&#39;</span><span style="color: #F8F8F2">,</span></span>
<span class="line"><span style="color: #F8F8F2">               </span><span style="color: #FD971F">fontface</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&#39;bold&#39;</span><span style="color: #F8F8F2">, </span><span style="color: #FD971F">size</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">18</span><span style="color: #F8F8F2">)</span></span>
<span class="line"><span style="color: #F8F8F2">cow_out </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> plot_grid(title, cow_net_plots, </span><span style="color: #FD971F">ncol</span><span style="color: #F92672">=</span><span style="color: #AE81FF">1</span><span style="color: #F8F8F2">, </span><span style="color: #FD971F">rel_heights</span><span style="color: #F92672">=</span><span style="color: #66D9EF">c</span><span style="color: #F8F8F2">(</span><span style="color: #AE81FF">0.1</span><span style="color: #F8F8F2">, </span><span style="color: #AE81FF">1</span><span style="color: #F8F8F2">))</span></span>
<span class="line"><span style="color: #F8F8F2">cow_out</span></span></code></pre></div>



<p id="058c">As can be seen, the correlation networks do change overtime. This is not news since we already saw in the previous section that the value of the correlation varies overtime (I know we showed this to be true for the BTC-DASH air but we’ll show that this is true for the rest of the coins in the next section.)</p>



<h3 class="wp-block-heading" id="eace">Daily Returns Correlations</h3>



<p id="ffb8">Let’s look at the percentage daily changes of the altcoins between 2015 and today.</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" style="font-size:.875rem;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:7.704864501953125px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="# plot the percent changes
p <- ggplot(alt_data[Date > ymd(&quot;2015-01-01&quot;)], aes(x = Date, y =  (100*pct_change), col = pair_usdt)) + geom_line()
p <- p + ggtitle(&quot;% Daily Returns over time&quot;) + ylab(&quot;Daily Return (%)&quot;) 
p <- p + theme_bw() + guides(col=guide_legend(title=&quot;Coin Pair&quot;))
ggplotly(p)" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki" style="background-color: #272822"><code><span class="line"><span style="color: #88846F"># plot the percent changes</span></span>
<span class="line"><span style="color: #F8F8F2">p </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> ggplot(alt_data[Date </span><span style="color: #F92672">></span><span style="color: #F8F8F2"> ymd(</span><span style="color: #E6DB74">&quot;2015-01-01&quot;</span><span style="color: #F8F8F2">)], aes(</span><span style="color: #FD971F">x</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> Date, </span><span style="color: #FD971F">y</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2">  (</span><span style="color: #AE81FF">100</span><span style="color: #F92672">*</span><span style="color: #F8F8F2">pct_change), </span><span style="color: #FD971F">col</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> pair_usdt)) </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> geom_line()</span></span>
<span class="line"><span style="color: #F8F8F2">p </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> p </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> ggtitle(</span><span style="color: #E6DB74">&quot;% Daily Returns over time&quot;</span><span style="color: #F8F8F2">) </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> ylab(</span><span style="color: #E6DB74">&quot;Daily Return (%)&quot;</span><span style="color: #F8F8F2">) </span></span>
<span class="line"><span style="color: #F8F8F2">p </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> p </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> theme_bw() </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> guides(</span><span style="color: #FD971F">col</span><span style="color: #F92672">=</span><span style="color: #F8F8F2">guide_legend(</span><span style="color: #FD971F">title</span><span style="color: #F92672">=</span><span style="color: #E6DB74">&quot;Coin Pair&quot;</span><span style="color: #F8F8F2">))</span></span>
<span class="line"><span style="color: #F8F8F2">ggplotly(p)</span></span></code></pre></div>


<div class="wp-block-image">
<figure class="aligncenter size-full"><img loading="lazy" decoding="async" width="828" height="492" loading="lazy" src="https://analyticadss.com/wp-content/uploads/2022/12/1_DF6wejykPy9QxICaxujyWg.webp" alt="" class="wp-image-4918" srcset="https://analyticadss.com/wp-content/uploads/2022/12/1_DF6wejykPy9QxICaxujyWg.webp 828w, https://analyticadss.com/wp-content/uploads/2022/12/1_DF6wejykPy9QxICaxujyWg-500x297.webp 500w, https://analyticadss.com/wp-content/uploads/2022/12/1_DF6wejykPy9QxICaxujyWg-150x89.webp 150w, https://analyticadss.com/wp-content/uploads/2022/12/1_DF6wejykPy9QxICaxujyWg-768x456.webp 768w" sizes="auto, (max-width: 828px) 100vw, 828px" /><figcaption class="wp-element-caption">Percentage daily returns for some coins</figcaption></figure>
</div>


<p id="def1">Although the above figure is very cluttered, one thing is certain, percentage daily returns vary greatly for crypto. Let’s try to make this figure a bit easier to read</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" style="font-size:.875rem;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:7.704864501953125px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="p <- ggplot(alt_data[Date > ymd(&quot;2015-01-01&quot;)], aes(x = Date, y =  (100*pct_change), col = pair_usdt)) + geom_line() + facet_wrap(~ pair_usdt)
p <- p + ggtitle(&quot;Percentage Daily Returns over time&quot;) + ylab(&quot;Daily Return (%)&quot;) 
p <- p + theme_bw() + theme(legend.position=&quot;none&quot;) 
ggplotly(p)" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki" style="background-color: #272822"><code><span class="line"><span style="color: #F8F8F2">p </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> ggplot(alt_data[Date </span><span style="color: #F92672">></span><span style="color: #F8F8F2"> ymd(</span><span style="color: #E6DB74">&quot;2015-01-01&quot;</span><span style="color: #F8F8F2">)], aes(</span><span style="color: #FD971F">x</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> Date, </span><span style="color: #FD971F">y</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2">  (</span><span style="color: #AE81FF">100</span><span style="color: #F92672">*</span><span style="color: #F8F8F2">pct_change), </span><span style="color: #FD971F">col</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> pair_usdt)) </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> geom_line() </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> facet_wrap(</span><span style="color: #F92672">~</span><span style="color: #F8F8F2"> pair_usdt)</span></span>
<span class="line"><span style="color: #F8F8F2">p </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> p </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> ggtitle(</span><span style="color: #E6DB74">&quot;Percentage Daily Returns over time&quot;</span><span style="color: #F8F8F2">) </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> ylab(</span><span style="color: #E6DB74">&quot;Daily Return (%)&quot;</span><span style="color: #F8F8F2">) </span></span>
<span class="line"><span style="color: #F8F8F2">p </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> p </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> theme_bw() </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> theme(</span><span style="color: #FD971F">legend.position</span><span style="color: #F92672">=</span><span style="color: #E6DB74">&quot;none&quot;</span><span style="color: #F8F8F2">) </span></span>
<span class="line"><span style="color: #F8F8F2">ggplotly(p)</span></span></code></pre></div>


<div class="wp-block-image">
<figure class="aligncenter size-full"><img loading="lazy" decoding="async" width="828" height="563" loading="lazy" src="https://analyticadss.com/wp-content/uploads/2022/12/1_frUEGj6qn8sgyJOH87UsWA.webp" alt="" class="wp-image-4919" srcset="https://analyticadss.com/wp-content/uploads/2022/12/1_frUEGj6qn8sgyJOH87UsWA.webp 828w, https://analyticadss.com/wp-content/uploads/2022/12/1_frUEGj6qn8sgyJOH87UsWA-500x340.webp 500w, https://analyticadss.com/wp-content/uploads/2022/12/1_frUEGj6qn8sgyJOH87UsWA-150x102.webp 150w, https://analyticadss.com/wp-content/uploads/2022/12/1_frUEGj6qn8sgyJOH87UsWA-768x522.webp 768w" sizes="auto, (max-width: 828px) 100vw, 828px" /><figcaption class="wp-element-caption">Percentage daily returns for some coins</figcaption></figure>
</div>


<p id="dc7b">It is kind of surprising that Bitcoin has the least variability in daily returns. The nice big spike around April 2nd 2017 shows a percentage daily return of ~88% for XRP, this is the highest daily return I have seen!</p>



<p id="1715">Let’s look at the percentage daily returns for Bitcoin and Litecoin since they seem to be highly correlated. I am going to zoom in on the time period 2016–02–01 and 2016–05–01.</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" style="font-size:.875rem;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:7.70486307144165px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="start_date <- ymd(&quot;2016-02-01&quot;)
end_date <- ymd(&quot;2016-05-01&quot;)
p <- ggplot(alt_data[pair_usdt %like% &quot;BTC|LTC&quot; & Date > start_date & Date < end_date], aes(x = Date, y =  (100*pct_change), col = pair_usdt)) + geom_line() + theme_bw() + ylab(&quot;Price (USD)&quot;)
p" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki" style="background-color: #272822"><code><span class="line"><span style="color: #F8F8F2">start_date </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> ymd(</span><span style="color: #E6DB74">&quot;2016-02-01&quot;</span><span style="color: #F8F8F2">)</span></span>
<span class="line"><span style="color: #F8F8F2">end_date </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> ymd(</span><span style="color: #E6DB74">&quot;2016-05-01&quot;</span><span style="color: #F8F8F2">)</span></span>
<span class="line"><span style="color: #F8F8F2">p </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> ggplot(alt_data[pair_usdt </span><span style="color: #F92672">%like%</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;BTC|LTC&quot;</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">&</span><span style="color: #F8F8F2"> Date </span><span style="color: #F92672">></span><span style="color: #F8F8F2"> start_date </span><span style="color: #F92672">&</span><span style="color: #F8F8F2"> Date </span><span style="color: #F92672"><</span><span style="color: #F8F8F2"> end_date], aes(</span><span style="color: #FD971F">x</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> Date, </span><span style="color: #FD971F">y</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2">  (</span><span style="color: #AE81FF">100</span><span style="color: #F92672">*</span><span style="color: #F8F8F2">pct_change), </span><span style="color: #FD971F">col</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> pair_usdt)) </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> geom_line() </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> theme_bw() </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> ylab(</span><span style="color: #E6DB74">&quot;Price (USD)&quot;</span><span style="color: #F8F8F2">)</span></span>
<span class="line"><span style="color: #F8F8F2">p</span></span></code></pre></div>


<div class="wp-block-image">
<figure class="aligncenter size-full"><img loading="lazy" decoding="async" width="828" height="480" loading="lazy" src="https://analyticadss.com/wp-content/uploads/2022/12/1_f0yWVUQ02uKmgNS3Euz8cA.webp" alt="" class="wp-image-4920" srcset="https://analyticadss.com/wp-content/uploads/2022/12/1_f0yWVUQ02uKmgNS3Euz8cA.webp 828w, https://analyticadss.com/wp-content/uploads/2022/12/1_f0yWVUQ02uKmgNS3Euz8cA-500x290.webp 500w, https://analyticadss.com/wp-content/uploads/2022/12/1_f0yWVUQ02uKmgNS3Euz8cA-150x87.webp 150w, https://analyticadss.com/wp-content/uploads/2022/12/1_f0yWVUQ02uKmgNS3Euz8cA-768x445.webp 768w" sizes="auto, (max-width: 828px) 100vw, 828px" /><figcaption class="wp-element-caption">Daily Return for Bitcoin and LTC in 2018</figcaption></figure>
</div>


<p id="db3e">The figure shows clear correlation between the daily returns of Bitcoin and litcoin. It also shows that these correlations can vary overtime. In fact, let’s look at how these correlations vary overtime.</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" style="font-size:.875rem;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:15.395828247070312px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="# these steps are similar to the ones in the previous section, the only differnect is that now we are looking at the percentage change in price difference on daily basis instead of the actual price
# subset data, only keep the date, the pair, and the price
alt_data_sub_pct <- alt_data[, .(Date, pair_usdt, pct_change)]
# convert to wide format 
alt_data_sub_pct <- spread(data = alt_data_sub_pct, key = &quot;pair_usdt&quot;, value = &quot;pct_change&quot;)
# clean column names
setnames(alt_data_sub_pct, gsub(&quot;USDT_&quot;, &quot;&quot;, colnames(alt_data_sub)))
# subset the data
btc_ltc <- alt_data_sub_pct[, .(Date, BTC, LTC)]
# add a year_month column
btc_ltc[, year_month := as.yearmon(Date)]
# calculate the correlation coefficient on montly basis
btc_ltc_2 <- btc_ltc[, cor(BTC, LTC), by = year_month]
# now plot the correlation coefficient as a function of month and year
plot(btc_ltc_2$year_month, btc_ltc_2$V1, xlab = &quot;Year-Month&quot;, main = &quot;Correlation Coeff. Between Daily Returns of BTC and LTC Over time&quot;
     , ylab = &quot;Correlation Coefficient&quot;, type = &quot;b&quot;, pch = 19, col = ifelse(btc_ltc_2$V1 > 0, &quot;blue&quot;, &quot;red&quot;)
     , ylim = c(-1, 1))" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki" style="background-color: #272822"><code><span class="line"><span style="color: #88846F"># these steps are similar to the ones in the previous section, the only differnect is that now we are looking at the percentage change in price difference on daily basis instead of the actual price</span></span>
<span class="line"><span style="color: #88846F"># subset data, only keep the date, the pair, and the price</span></span>
<span class="line"><span style="color: #F8F8F2">alt_data_sub_pct </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> alt_data[, .(Date, pair_usdt, pct_change)]</span></span>
<span class="line"><span style="color: #88846F"># convert to wide format </span></span>
<span class="line"><span style="color: #F8F8F2">alt_data_sub_pct </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> spread(</span><span style="color: #FD971F">data</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> alt_data_sub_pct, </span><span style="color: #FD971F">key</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;pair_usdt&quot;</span><span style="color: #F8F8F2">, </span><span style="color: #FD971F">value</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;pct_change&quot;</span><span style="color: #F8F8F2">)</span></span>
<span class="line"><span style="color: #88846F"># clean column names</span></span>
<span class="line"><span style="color: #F8F8F2">setnames(alt_data_sub_pct, </span><span style="color: #66D9EF">gsub</span><span style="color: #F8F8F2">(</span><span style="color: #E6DB74">&quot;USDT_&quot;</span><span style="color: #F8F8F2">, </span><span style="color: #E6DB74">&quot;&quot;</span><span style="color: #F8F8F2">, </span><span style="color: #66D9EF">colnames</span><span style="color: #F8F8F2">(alt_data_sub)))</span></span>
<span class="line"><span style="color: #88846F"># subset the data</span></span>
<span class="line"><span style="color: #F8F8F2">btc_ltc </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> alt_data_sub_pct[, .(Date, BTC, LTC)]</span></span>
<span class="line"><span style="color: #88846F"># add a year_month column</span></span>
<span class="line"><span style="color: #F8F8F2">btc_ltc[, year_month </span><span style="color: #F92672">:=</span><span style="color: #F8F8F2"> as.yearmon(Date)]</span></span>
<span class="line"><span style="color: #88846F"># calculate the correlation coefficient on montly basis</span></span>
<span class="line"><span style="color: #F8F8F2">btc_ltc_2 </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> btc_ltc[, </span><span style="color: #66D9EF">cor</span><span style="color: #F8F8F2">(BTC, LTC), </span><span style="color: #FD971F">by</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> year_month]</span></span>
<span class="line"><span style="color: #88846F"># now plot the correlation coefficient as a function of month and year</span></span>
<span class="line"><span style="color: #66D9EF">plot</span><span style="color: #F8F8F2">(btc_ltc_2</span><span style="color: #F92672">$</span><span style="color: #F8F8F2">year_month, btc_ltc_2</span><span style="color: #F92672">$</span><span style="color: #F8F8F2">V1, </span><span style="color: #FD971F">xlab</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;Year-Month&quot;</span><span style="color: #F8F8F2">, </span><span style="color: #FD971F">main</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;Correlation Coeff. Between Daily Returns of BTC and LTC Over time&quot;</span></span>
<span class="line"><span style="color: #F8F8F2">     , </span><span style="color: #FD971F">ylab</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;Correlation Coefficient&quot;</span><span style="color: #F8F8F2">, </span><span style="color: #FD971F">type</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;b&quot;</span><span style="color: #F8F8F2">, </span><span style="color: #FD971F">pch</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">19</span><span style="color: #F8F8F2">, </span><span style="color: #FD971F">col</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">ifelse</span><span style="color: #F8F8F2">(btc_ltc_2</span><span style="color: #F92672">$</span><span style="color: #F8F8F2">V1 </span><span style="color: #F92672">></span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">0</span><span style="color: #F8F8F2">, </span><span style="color: #E6DB74">&quot;blue&quot;</span><span style="color: #F8F8F2">, </span><span style="color: #E6DB74">&quot;red&quot;</span><span style="color: #F8F8F2">)</span></span>
<span class="line"><span style="color: #F8F8F2">     , </span><span style="color: #FD971F">ylim</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">c</span><span style="color: #F8F8F2">(</span><span style="color: #F92672">-</span><span style="color: #AE81FF">1</span><span style="color: #F8F8F2">, </span><span style="color: #AE81FF">1</span><span style="color: #F8F8F2">))</span></span></code></pre></div>


<div class="wp-block-image">
<figure class="aligncenter size-full"><img loading="lazy" decoding="async" width="828" height="519" loading="lazy" src="https://analyticadss.com/wp-content/uploads/2022/12/1_YozYWrtQM_qHssvavSNHFw-1.webp" alt="" class="wp-image-4922" srcset="https://analyticadss.com/wp-content/uploads/2022/12/1_YozYWrtQM_qHssvavSNHFw-1.webp 828w, https://analyticadss.com/wp-content/uploads/2022/12/1_YozYWrtQM_qHssvavSNHFw-1-500x313.webp 500w, https://analyticadss.com/wp-content/uploads/2022/12/1_YozYWrtQM_qHssvavSNHFw-1-150x94.webp 150w, https://analyticadss.com/wp-content/uploads/2022/12/1_YozYWrtQM_qHssvavSNHFw-1-768x481.webp 768w" sizes="auto, (max-width: 828px) 100vw, 828px" /></figure>
</div>


<p id="62de">Interesting, the correlation between the percentage daily change of the prices for bitcoin and litecoin is much more on the positive side, we only have one month in which this correlation is negtive, barely negative. This is a lot different than what we saw between Bitcoin and DASH, but that was for the actual prices and not the daily returns. Let’s redo this plot but his time for the actual prices for bitcoin and litecoin, just like we did with DASH.</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" style="font-size:.875rem;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:15.395828247070312px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="# subset the data
btc_ltc_price <- alt_data_sub[, .(Date, BTC, LTC)]
# add a year_month column
btc_ltc_price[, year_month := as.yearmon(Date)]
# calculate the correlation coefficient on montly basis
btc_ltc_price_2 <- btc_ltc_price[, cor(BTC, LTC), by = year_month]
# now plot the correlation coefficient as a function of month and year
plot(btc_ltc_price_2$year_month, btc_ltc_price_2$V1, xlab = &quot;Year-Month&quot;, main = &quot;Correlation Coeff. Between BTC and Litecoin Over time&quot;
     , ylab = &quot;Correlation Coefficient&quot;, type = &quot;b&quot;, pch = 19, col = ifelse(btc_ltc_price_2$V1 > 0, &quot;blue&quot;, &quot;red&quot;)
     , ylim = c(-1, 1))" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki" style="background-color: #272822"><code><span class="line"><span style="color: #88846F"># subset the data</span></span>
<span class="line"><span style="color: #F8F8F2">btc_ltc_price </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> alt_data_sub[, .(Date, BTC, LTC)]</span></span>
<span class="line"><span style="color: #88846F"># add a year_month column</span></span>
<span class="line"><span style="color: #F8F8F2">btc_ltc_price[, year_month </span><span style="color: #F92672">:=</span><span style="color: #F8F8F2"> as.yearmon(Date)]</span></span>
<span class="line"><span style="color: #88846F"># calculate the correlation coefficient on montly basis</span></span>
<span class="line"><span style="color: #F8F8F2">btc_ltc_price_2 </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> btc_ltc_price[, </span><span style="color: #66D9EF">cor</span><span style="color: #F8F8F2">(BTC, LTC), </span><span style="color: #FD971F">by</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> year_month]</span></span>
<span class="line"><span style="color: #88846F"># now plot the correlation coefficient as a function of month and year</span></span>
<span class="line"><span style="color: #66D9EF">plot</span><span style="color: #F8F8F2">(btc_ltc_price_2</span><span style="color: #F92672">$</span><span style="color: #F8F8F2">year_month, btc_ltc_price_2</span><span style="color: #F92672">$</span><span style="color: #F8F8F2">V1, </span><span style="color: #FD971F">xlab</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;Year-Month&quot;</span><span style="color: #F8F8F2">, </span><span style="color: #FD971F">main</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;Correlation Coeff. Between BTC and Litecoin Over time&quot;</span></span>
<span class="line"><span style="color: #F8F8F2">     , </span><span style="color: #FD971F">ylab</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;Correlation Coefficient&quot;</span><span style="color: #F8F8F2">, </span><span style="color: #FD971F">type</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;b&quot;</span><span style="color: #F8F8F2">, </span><span style="color: #FD971F">pch</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">19</span><span style="color: #F8F8F2">, </span><span style="color: #FD971F">col</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">ifelse</span><span style="color: #F8F8F2">(btc_ltc_price_2</span><span style="color: #F92672">$</span><span style="color: #F8F8F2">V1 </span><span style="color: #F92672">></span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">0</span><span style="color: #F8F8F2">, </span><span style="color: #E6DB74">&quot;blue&quot;</span><span style="color: #F8F8F2">, </span><span style="color: #E6DB74">&quot;red&quot;</span><span style="color: #F8F8F2">)</span></span>
<span class="line"><span style="color: #F8F8F2">     , </span><span style="color: #FD971F">ylim</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">c</span><span style="color: #F8F8F2">(</span><span style="color: #F92672">-</span><span style="color: #AE81FF">1</span><span style="color: #F8F8F2">, </span><span style="color: #AE81FF">1</span><span style="color: #F8F8F2">))</span></span></code></pre></div>



<p id="494c">Trends in the correlations of the daily return of bitcoin and litecoin on mothly basis, boy this is a mouth full, are very similar to those for the prices as we saw in the previous figure.</p>



<p id="2a55">In the next post we’ll do something more statistically sound, rolling correlations.</p>



<p>Read More blogs in AnalyticaDSS Blogs here : <a href="https://analyticadss.com/blog">BLOGS</a></p>



<p>Read More blogs in Medium : <a href="https://medium.com/@aousabdo">Medium Blogs</a></p>



<p>Read More blogs in R-bloggers : <a href="https://www.r-bloggers.com/">https://www.r-bloggers.com</a></p>
<p>The post <a href="https://analyticadss.com/analyzing-cryptocurrency-markets-using-r-part-2/">Analyzing Crypto Market using R — Part 2</a> appeared first on <a href="https://analyticadss.com">Analytica Data Science Solutions</a>.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Analyzing Crypto Markets using R — Part 1</title>
		<link>https://analyticadss.com/analyzing-cryptocurrency-markets-using-r-part-1/</link>
		
		<dc:creator><![CDATA[Aous Abdo]]></dc:creator>
		<pubDate>Sat, 01 Dec 2018 10:19:38 +0000</pubDate>
				<category><![CDATA[Data Science]]></category>
		<category><![CDATA[R Statistical Language]]></category>
		<category><![CDATA[Bitcoin]]></category>
		<category><![CDATA[Cryptocurrency]]></category>
		<category><![CDATA[R]]></category>
		<guid isPermaLink="false">https://analyticadss.com/?p=4897</guid>

					<description><![CDATA[<p>Downloading and Processing Crypto Data with R Analyzing crypto market Aous Abdo, WWW.ANALYTICADSS.COMAn interactive version of this post can be found here. No doubt that crypto currencies with all the promises they bring, both financially and otherwise, are only here to stay. As a data scientist interested in data and numbers, I thought it would be nice [&#8230;]</p>
<p>The post <a href="https://analyticadss.com/analyzing-cryptocurrency-markets-using-r-part-1/">Analyzing Crypto Markets using R — Part 1</a> appeared first on <a href="https://analyticadss.com">Analytica Data Science Solutions</a>.</p>
]]></description>
										<content:encoded><![CDATA[
<h2 class="wp-block-heading" id="e5da">Downloading and Processing Crypto Data with R</h2>



<p>Analyzing crypto market</p>



<p><a href="https://medium.com/u/4f20dbfad286?source=post_page-----9e0d1bff7c63--------------------------------" rel="noreferrer noopener" target="_blank">Aous Abdo</a>, <a href="http://www.analyticadss.com/" rel="noreferrer noopener" target="_blank">WWW.ANALYTICADSS.COM</a><br>An interactive version of this post can be found <a href="https://analyticadss.com/adss_blog/crypto_notebook_part1.nb.html" rel="noreferrer noopener" target="_blank">here</a>.</p>



<p id="36bb">No doubt that crypto currencies with all the promises they bring, both financially and otherwise, are only here to stay. As a data scientist interested in data and numbers, I thought it would be nice to take a look at some crypto currencies with my favorite tool, <a href="https://cran.r-project.org/" target="_blank" rel="noreferrer noopener"><strong>R</strong></a>.</p>



<h2 class="wp-block-heading" id="65a0">R Libraries</h2>



<p id="33d5">Below is a list of <strong>R</strong> libraries we will be using to help us with our analysis. Not all of them are necessary but they all will make our life easier.</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" style="font-size:.875rem;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:15.395828247070312px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="library(PoloniexR)
library(data.table)
library(lubridate)
library(Quandl)
library(plyr)
library(stringr)
library(ggplot2)
library(plotly)
library(janitor)
library(quantmod)
library(pryr)
library(corrplot)
library(PerformanceAnalytics)
library(tidyr)
library(MLmetrics)
library(readr)" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki" style="background-color: #272822"><code><span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(PoloniexR)</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(data.table)</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(lubridate)</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(Quandl)</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(plyr)</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(stringr)</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(ggplot2)</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(plotly)</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(janitor)</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(quantmod)</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(pryr)</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(corrplot)</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(PerformanceAnalytics)</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(tidyr)</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(MLmetrics)</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(readr)</span></span></code></pre></div>



<h2 class="wp-block-heading" id="f526">Getting the Data</h2>



<h2 class="wp-block-heading" id="2c1e">1. PoloniexR Package</h2>



<p id="69f7">The easiest way to get current and historical data for <strong>cyrpto </strong>currencies is by using the <strong><a href="https://cran.r-project.org/web/packages/PoloniexR/index.html" target="_blank" rel="noreferrer noopener">PoloniexR</a> </strong>developed by <em>Vermeir Jellen</em>. <em>Vermeir Jellen </em>gives a good tutorial on how to start with his package <a href="https://github.com/VermeirJellen/PoloniexR" target="_blank" rel="noreferrer noopener"><strong>here</strong></a>. The <a href="https://poloniex.com/exchange" target="_blank" rel="noreferrer noopener"><strong>Poloniex exchange</strong></a> includes many coins but not all. For missiong coins on Poloniex, one can scrape the <a href="http://rstudio-pubs-static.s3.amazonaws.com/www.coinmarketcap.com" target="_blank" rel="noreferrer noopener">coinmarketcap</a> page, an example is given here.</p>



<h2 class="wp-block-heading" id="34de">2. Quandl</h2>



<p id="11e7"><strong><a href="https://www.quandl.com/" target="_blank" rel="noreferrer noopener">Quandl</a> </strong>is my go to place for any financial data. Their free-tier API has lots of good data one can use. <strong>Quandl </strong>offers data from multiple exchanges. Locating crypto data on <strong>Quandl </strong>is not straight forward. After spending few hours on their site I found out that most of the crypto data can be found <a href="https://www.quandl.com/data/BITFINEX-Bitfinex" target="_blank" rel="noreferrer noopener">here</a></p>



<p id="5a62">First, let’s take a look at different exchange data for Bitcoin using <strong>Quandl</strong>. We will download and plot historical bitcoin data from the following exchanges Kraken, <strong>Coinbase</strong>, <strong>Bitstamp</strong>, and ITBIT</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" style="font-size:.875rem;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:15.395835876464844px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="# enable your Quandl API key
my_quandl_api_key <- read_file(&quot;../../quandl_api_key.txt&quot;)
Quandl.api_key(my_quandl_api_key)
# function to download quandl data
get_quandl_data <- function(data_source = &quot;BITFINEX&quot;
                            , pair = 'btcusd'
                            , ...){
  
  # make sure the user supplied the correct data_source
  if(toupper(data_source) != &quot;BITFINEX&quot;) stop(&quot;data source supplied is wrong...&quot;)
  # quandl is case sensitive, all codes have to be upper case
  pair <- toupper(pair)
  tmp <- NA
  try(tmp <- Quandl(code = toupper(paste(data_source, pair, sep = &quot;/&quot;)), ...), silent = TRUE)
  return(tmp)
}
# get btc data from different exchanges
  exchange_data <- list()
  
  exchanges <- c('KRAKENUSD','COINBASEUSD','BITSTAMPUSD','ITBITUSD')
  
  for (i in exchanges){
    exchange_data[[i]] <- Quandl(paste0('BCHARTS/', i))
  }" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki" style="background-color: #272822"><code><span class="line"><span style="color: #88846F"># enable your Quandl API key</span></span>
<span class="line"><span style="color: #F8F8F2">my_quandl_api_key </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> read_file(</span><span style="color: #E6DB74">&quot;../../quandl_api_key.txt&quot;</span><span style="color: #F8F8F2">)</span></span>
<span class="line"><span style="color: #F8F8F2">Quandl.api_key(my_quandl_api_key)</span></span>
<span class="line"><span style="color: #88846F"># function to download quandl data</span></span>
<span class="line"><span style="color: #A6E22E">get_quandl_data</span><span style="color: #F8F8F2"> </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">function</span><span style="color: #F8F8F2">(</span><span style="color: #FD971F">data_source</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;BITFINEX&quot;</span></span>
<span class="line"><span style="color: #F8F8F2">                            , </span><span style="color: #FD971F">pair</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&#39;btcusd&#39;</span></span>
<span class="line"><span style="color: #F8F8F2">                            , </span><span style="color: #F92672">...</span><span style="color: #F8F8F2">){</span></span>
<span class="line"><span style="color: #F8F8F2">  </span></span>
<span class="line"><span style="color: #F8F8F2">  </span><span style="color: #88846F"># make sure the user supplied the correct data_source</span></span>
<span class="line"><span style="color: #F8F8F2">  </span><span style="color: #F92672">if</span><span style="color: #F8F8F2">(</span><span style="color: #66D9EF">toupper</span><span style="color: #F8F8F2">(data_source) </span><span style="color: #F92672">!=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;BITFINEX&quot;</span><span style="color: #F8F8F2">) </span><span style="color: #66D9EF">stop</span><span style="color: #F8F8F2">(</span><span style="color: #E6DB74">&quot;data source supplied is wrong...&quot;</span><span style="color: #F8F8F2">)</span></span>
<span class="line"><span style="color: #F8F8F2">  </span><span style="color: #88846F"># quandl is case sensitive, all codes have to be upper case</span></span>
<span class="line"><span style="color: #F8F8F2">  pair </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">toupper</span><span style="color: #F8F8F2">(pair)</span></span>
<span class="line"><span style="color: #F8F8F2">  tmp </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">NA</span></span>
<span class="line"><span style="color: #F8F8F2">  </span><span style="color: #66D9EF">try</span><span style="color: #F8F8F2">(tmp </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> Quandl(</span><span style="color: #FD971F">code</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">toupper</span><span style="color: #F8F8F2">(</span><span style="color: #66D9EF">paste</span><span style="color: #F8F8F2">(data_source, pair, </span><span style="color: #FD971F">sep</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;/&quot;</span><span style="color: #F8F8F2">)), </span><span style="color: #F92672">...</span><span style="color: #F8F8F2">), </span><span style="color: #FD971F">silent</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">TRUE</span><span style="color: #F8F8F2">)</span></span>
<span class="line"><span style="color: #F8F8F2">  </span><span style="color: #F92672">return</span><span style="color: #F8F8F2">(tmp)</span></span>
<span class="line"><span style="color: #F8F8F2">}</span></span>
<span class="line"><span style="color: #88846F"># get btc data from different exchanges</span></span>
<span class="line"><span style="color: #F8F8F2">  exchange_data </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">list</span><span style="color: #F8F8F2">()</span></span>
<span class="line"><span style="color: #F8F8F2">  </span></span>
<span class="line"><span style="color: #F8F8F2">  exchanges </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">c</span><span style="color: #F8F8F2">(</span><span style="color: #E6DB74">&#39;KRAKENUSD&#39;</span><span style="color: #F8F8F2">,</span><span style="color: #E6DB74">&#39;COINBASEUSD&#39;</span><span style="color: #F8F8F2">,</span><span style="color: #E6DB74">&#39;BITSTAMPUSD&#39;</span><span style="color: #F8F8F2">,</span><span style="color: #E6DB74">&#39;ITBITUSD&#39;</span><span style="color: #F8F8F2">)</span></span>
<span class="line"><span style="color: #F8F8F2">  </span></span>
<span class="line"><span style="color: #F8F8F2">  </span><span style="color: #F92672">for</span><span style="color: #F8F8F2"> (i </span><span style="color: #F92672">in</span><span style="color: #F8F8F2"> exchanges){</span></span>
<span class="line"><span style="color: #F8F8F2">    exchange_data[[i]] </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> Quandl(</span><span style="color: #66D9EF">paste0</span><span style="color: #F8F8F2">(</span><span style="color: #E6DB74">&#39;BCHARTS/&#39;</span><span style="color: #F8F8F2">, i))</span></span>
<span class="line"><span style="color: #F8F8F2">  }</span></span></code></pre></div>



<p id="935a">So We need to convert this list of BTC prices from different exchanges into a <strong>dataframe </strong>and put them all in one data frame so we can plot them.</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" style="font-size:.875rem;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:7.7048797607421875px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="# put them all in one dataframe to plot in ggplot2
btc_usd <- do.call(&quot;rbind&quot;, exchange_data)
btc_usd$exchange <- row.names(btc_usd)
btc_usd <- as.data.table(btc_usd)" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki" style="background-color: #272822"><code><span class="line"><span style="color: #88846F"># put them all in one dataframe to plot in ggplot2</span></span>
<span class="line"><span style="color: #F8F8F2">btc_usd </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">do.call</span><span style="color: #F8F8F2">(</span><span style="color: #E6DB74">&quot;rbind&quot;</span><span style="color: #F8F8F2">, exchange_data)</span></span>
<span class="line"><span style="color: #F8F8F2">btc_usd</span><span style="color: #F92672">$</span><span style="color: #F8F8F2">exchange </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">row.names</span><span style="color: #F8F8F2">(btc_usd)</span></span>
<span class="line"><span style="color: #F8F8F2">btc_usd </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> as.data.table(btc_usd)</span></span></code></pre></div>



<p>We also need to do some minor cleaning, so let’s do that. We also need to get rid of rows of data with 0 weighted price.</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" style="font-size:.875rem;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:7.704864501953125px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="# some data cleaning
btc_usd[, exchange := as.factor(str_extract(exchange, &quot;[A-Z]+&quot;))]
btc_usd <- clean_names(btc_usd)
btc_usd <- btc_usd[weighted_price > 0]
# set datatable key to be the date column
setkey(btc_usd, date)" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki" style="background-color: #272822"><code><span class="line"><span style="color: #88846F"># some data cleaning</span></span>
<span class="line"><span style="color: #F8F8F2">btc_usd[, exchange </span><span style="color: #F92672">:=</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">as.factor</span><span style="color: #F8F8F2">(str_extract(exchange, </span><span style="color: #E6DB74">&quot;[A-Z]+&quot;</span><span style="color: #F8F8F2">))]</span></span>
<span class="line"><span style="color: #F8F8F2">btc_usd </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> clean_names(btc_usd)</span></span>
<span class="line"><span style="color: #F8F8F2">btc_usd </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> btc_usd[weighted_price </span><span style="color: #F92672">></span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">0</span><span style="color: #F8F8F2">]</span></span>
<span class="line"><span style="color: #88846F"># set datatable key to be the date column</span></span>
<span class="line"><span style="color: #F8F8F2">setkey(btc_usd, date)</span></span></code></pre></div>



<p id="4b66">Let’s take a look at the data table we just made.</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" style="font-size:.875rem;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:7.704864501953125px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="head(btc_usd)" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki" style="background-color: #272822"><code><span class="line"><span style="color: #66D9EF">head</span><span style="color: #F8F8F2">(btc_usd)</span></span></code></pre></div>


<div class="wp-block-image">
<figure class="aligncenter size-full"><img loading="lazy" decoding="async" width="828" height="245" loading="lazy" src="https://analyticadss.com/wp-content/uploads/2022/12/1_2OI9hXg9BGzha6OQ-KkIsg.webp" alt="" class="wp-image-4898" srcset="https://analyticadss.com/wp-content/uploads/2022/12/1_2OI9hXg9BGzha6OQ-KkIsg.webp 828w, https://analyticadss.com/wp-content/uploads/2022/12/1_2OI9hXg9BGzha6OQ-KkIsg-500x148.webp 500w, https://analyticadss.com/wp-content/uploads/2022/12/1_2OI9hXg9BGzha6OQ-KkIsg-150x44.webp 150w, https://analyticadss.com/wp-content/uploads/2022/12/1_2OI9hXg9BGzha6OQ-KkIsg-768x227.webp 768w" sizes="auto, (max-width: 828px) 100vw, 828px" /></figure>
</div>


<p id="ed3e">The data includes 10 columns, the date, <strong>OCHL </strong>prices, volumes in USD and BTC, the weighted price, and the exchange. I wish I had bought me some <em>bitcoine </em>back in 2011!!!</p>



<p id="5c06">Now we’ll look at the price of bitcoin and color code it by exchange.</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" style="font-size:.875rem;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:7.704833984375px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="ggplot(btc_usd, aes(x = date, y = weighted_price, col = exchange)) + geom_line() + theme_bw()" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki" style="background-color: #272822"><code><span class="line"><span style="color: #F8F8F2">ggplot(btc_usd, aes(</span><span style="color: #FD971F">x</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> date, </span><span style="color: #FD971F">y</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> weighted_price, </span><span style="color: #FD971F">col</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> exchange)) </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> geom_line() </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> theme_bw()</span></span></code></pre></div>


<div class="wp-block-image">
<figure class="aligncenter size-full"><img loading="lazy" decoding="async" width="828" height="497" loading="lazy" src="https://analyticadss.com/wp-content/uploads/2022/12/1_pKZF3UsW8ntNBCpd7RhgTA.webp" alt="" class="wp-image-4899" srcset="https://analyticadss.com/wp-content/uploads/2022/12/1_pKZF3UsW8ntNBCpd7RhgTA.webp 828w, https://analyticadss.com/wp-content/uploads/2022/12/1_pKZF3UsW8ntNBCpd7RhgTA-500x300.webp 500w, https://analyticadss.com/wp-content/uploads/2022/12/1_pKZF3UsW8ntNBCpd7RhgTA-150x90.webp 150w, https://analyticadss.com/wp-content/uploads/2022/12/1_pKZF3UsW8ntNBCpd7RhgTA-768x461.webp 768w" sizes="auto, (max-width: 828px) 100vw, 828px" /></figure>
</div>


<h2 class="wp-block-heading" id="7d74">Arbitrage</h2>



<p id="0a27">It appears the prices of <strong>btc </strong>on different exchanges are fairly consisant. But this is an artifact in the figure since we are covering several orders of magnitudes during the timeline we selected. To better see any price differenes we need to zoon in on the figure. Let’s zoom in on, say the first month of <strong>2018</strong>, were we had the <strong>ATH </strong>for all coins. This will enable us to better see any differences in prices.</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" style="font-size:.875rem;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:7.70489501953125px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="ggplot(btc_usd[date >= ymd(&quot;2018-01-01&quot;) & date <= ymd(&quot;2018-01-31&quot;)], aes(x = date, y = weighted_price, col = exchange)) + geom_line() + theme_bw()" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki" style="background-color: #272822"><code><span class="line"><span style="color: #F8F8F2">ggplot(btc_usd[date </span><span style="color: #F92672">>=</span><span style="color: #F8F8F2"> ymd(</span><span style="color: #E6DB74">&quot;2018-01-01&quot;</span><span style="color: #F8F8F2">) </span><span style="color: #F92672">&</span><span style="color: #F8F8F2"> date </span><span style="color: #F92672"><=</span><span style="color: #F8F8F2"> ymd(</span><span style="color: #E6DB74">&quot;2018-01-31&quot;</span><span style="color: #F8F8F2">)], aes(</span><span style="color: #FD971F">x</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> date, </span><span style="color: #FD971F">y</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> weighted_price, </span><span style="color: #FD971F">col</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> exchange)) </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> geom_line() </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> theme_bw()</span></span></code></pre></div>



<p>There are obvious differences in prices between the exchanges. Differences seem to vary over time as well. Actually it will be interesting to look at the maxiumum price differences as a function of time, let’s do that</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" style="font-size:.875rem;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:7.70489501953125px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="# first let's find the minimum price by date
btc_usd[, min_price := min(weighted_price), by = date]
# now we need to find the price difference between the price for each day and the minimum price for that day
# but since the price of bitcoin varies a lot for the time period under study, we need to normalize the price difference
# to do that we will just divide by the median price for each day
btc_usd[, price_diff := 100*(weighted_price - min_price)/median(weighted_price), by = (date)]" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki" style="background-color: #272822"><code><span class="line"><span style="color: #88846F"># first let&#39;s find the minimum price by date</span></span>
<span class="line"><span style="color: #F8F8F2">btc_usd[, min_price </span><span style="color: #F92672">:=</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">min</span><span style="color: #F8F8F2">(weighted_price), </span><span style="color: #FD971F">by</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> date]</span></span>
<span class="line"><span style="color: #88846F"># now we need to find the price difference between the price for each day and the minimum price for that day</span></span>
<span class="line"><span style="color: #88846F"># but since the price of bitcoin varies a lot for the time period under study, we need to normalize the price difference</span></span>
<span class="line"><span style="color: #88846F"># to do that we will just divide by the median price for each day</span></span>
<span class="line"><span style="color: #F8F8F2">btc_usd[, price_diff </span><span style="color: #F92672">:=</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">100</span><span style="color: #F92672">*</span><span style="color: #F8F8F2">(weighted_price </span><span style="color: #F92672">-</span><span style="color: #F8F8F2"> min_price)</span><span style="color: #F92672">/</span><span style="color: #66D9EF">median</span><span style="color: #F8F8F2">(weighted_price), </span><span style="color: #FD971F">by</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> (date)]</span></span></code></pre></div>



<p>Now we have a new column which gives us the percentage of price differences for each day normalized to the median price for each day. Let’s take a look at the new table.</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" style="font-size:.875rem;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:7.704864501953125px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="tail(btc_usd)" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki" style="background-color: #272822"><code><span class="line"><span style="color: #66D9EF">tail</span><span style="color: #F8F8F2">(btc_usd)</span></span></code></pre></div>



<p id="574b">The reason I looked at the newer dates is that prior to <strong>2014 </strong>we only have data for one exchange, so all the price differences were <strong>0</strong>. Let’s take a look at the price differences as a function of time.</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" style="font-size:.875rem;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:7.704833984375px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="# first we need to create a new data table with only the maxiumum prices per day
tmp <- btc_usd[, price_diff := max(price_diff), by = date]
# This will help us visualize overlapping points
MyGray <- rgb(t(col2rgb(&quot;black&quot;)), alpha=50, maxColorValue=255)
tmp[, plot(date, price_diff, pch=20, col = MyGray, xlab = &quot;Date&quot;, ylab = &quot;Maximum of Price Difference (%)&quot;)]" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki" style="background-color: #272822"><code><span class="line"><span style="color: #88846F"># first we need to create a new data table with only the maxiumum prices per day</span></span>
<span class="line"><span style="color: #F8F8F2">tmp </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> btc_usd[, price_diff </span><span style="color: #F92672">:=</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">max</span><span style="color: #F8F8F2">(price_diff), </span><span style="color: #FD971F">by</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> date]</span></span>
<span class="line"><span style="color: #88846F"># This will help us visualize overlapping points</span></span>
<span class="line"><span style="color: #F8F8F2">MyGray </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">rgb</span><span style="color: #F8F8F2">(</span><span style="color: #66D9EF">t</span><span style="color: #F8F8F2">(</span><span style="color: #66D9EF">col2rgb</span><span style="color: #F8F8F2">(</span><span style="color: #E6DB74">&quot;black&quot;</span><span style="color: #F8F8F2">)), </span><span style="color: #FD971F">alpha</span><span style="color: #F92672">=</span><span style="color: #AE81FF">50</span><span style="color: #F8F8F2">, </span><span style="color: #FD971F">maxColorValue</span><span style="color: #F92672">=</span><span style="color: #AE81FF">255</span><span style="color: #F8F8F2">)</span></span>
<span class="line"><span style="color: #F8F8F2">tmp[, </span><span style="color: #66D9EF">plot</span><span style="color: #F8F8F2">(date, price_diff, </span><span style="color: #FD971F">pch</span><span style="color: #F92672">=</span><span style="color: #AE81FF">20</span><span style="color: #F8F8F2">, </span><span style="color: #FD971F">col</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> MyGray, </span><span style="color: #FD971F">xlab</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;Date&quot;</span><span style="color: #F8F8F2">, </span><span style="color: #FD971F">ylab</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;Maximum of Price Difference (%)&quot;</span><span style="color: #F8F8F2">)]</span></span></code></pre></div>


<div class="wp-block-image">
<figure class="aligncenter size-full"><img loading="lazy" decoding="async" width="828" height="450" loading="lazy" src="https://analyticadss.com/wp-content/uploads/2022/12/1_OCmdv1nAXyXQXtM-V_p4rw.webp" alt="" class="wp-image-4900" srcset="https://analyticadss.com/wp-content/uploads/2022/12/1_OCmdv1nAXyXQXtM-V_p4rw.webp 828w, https://analyticadss.com/wp-content/uploads/2022/12/1_OCmdv1nAXyXQXtM-V_p4rw-500x272.webp 500w, https://analyticadss.com/wp-content/uploads/2022/12/1_OCmdv1nAXyXQXtM-V_p4rw-150x82.webp 150w, https://analyticadss.com/wp-content/uploads/2022/12/1_OCmdv1nAXyXQXtM-V_p4rw-768x417.webp 768w" sizes="auto, (max-width: 828px) 100vw, 828px" /></figure>
</div>


<p id="8ee2">Let’s show the plot with log scale on <strong>y axis</strong>. Let’s also discard dates with zero price differences.</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" style="font-size:.875rem;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:7.704833984375px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="tmp[price_diff > 0, plot(date, price_diff, pch=20, col = MyGray, log = &quot;y&quot; , xlab = &quot;Date&quot;, ylab = &quot;Maximum of Price Difference (%)&quot;)]" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki" style="background-color: #272822"><code><span class="line"><span style="color: #F8F8F2">tmp[price_diff </span><span style="color: #F92672">></span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">0</span><span style="color: #F8F8F2">, </span><span style="color: #66D9EF">plot</span><span style="color: #F8F8F2">(date, price_diff, </span><span style="color: #FD971F">pch</span><span style="color: #F92672">=</span><span style="color: #AE81FF">20</span><span style="color: #F8F8F2">, </span><span style="color: #FD971F">col</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> MyGray, </span><span style="color: #FD971F">log</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;y&quot;</span><span style="color: #F8F8F2"> , </span><span style="color: #FD971F">xlab</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;Date&quot;</span><span style="color: #F8F8F2">, </span><span style="color: #FD971F">ylab</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;Maximum of Price Difference (%)&quot;</span><span style="color: #F8F8F2">)]</span></span></code></pre></div>


<div class="wp-block-image">
<figure class="aligncenter size-full"><img loading="lazy" decoding="async" width="828" height="464" loading="lazy" src="https://analyticadss.com/wp-content/uploads/2022/12/1_hW8T7SyWYrpL4FWYN7Y5sg.webp" alt="" class="wp-image-4901" srcset="https://analyticadss.com/wp-content/uploads/2022/12/1_hW8T7SyWYrpL4FWYN7Y5sg.webp 828w, https://analyticadss.com/wp-content/uploads/2022/12/1_hW8T7SyWYrpL4FWYN7Y5sg-500x280.webp 500w, https://analyticadss.com/wp-content/uploads/2022/12/1_hW8T7SyWYrpL4FWYN7Y5sg-150x84.webp 150w, https://analyticadss.com/wp-content/uploads/2022/12/1_hW8T7SyWYrpL4FWYN7Y5sg-768x430.webp 768w" sizes="auto, (max-width: 828px) 100vw, 828px" /></figure>
</div>


<p>As one can see from the figure above, the bulk of the maximum difference in bitcoin prices between the different exchanges is in the <strong>0.5–2.0%</strong> range. It is also interesting to see that the differences in prices seem to have come down between 2014 and 2016, but they seem to go up starting in <strong>2017</strong>. Let’s fit a gam model to see what we get.</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" style="font-size:.875rem;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:7.703125px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="# we'll use ggplot and fit a gam smooth line
ggplot(tmp[price_diff > 0 ], aes(x = date, y = price_diff)) + geom_point(alpha = 0.2, shape = 16, size = 3, show.legend = FALSE) + scale_y_continuous(trans='log10') + geom_smooth(method = &quot;gam&quot;, formula = y ~ s(x, bs = &quot;cs&quot;)) + theme_bw() + xlab(&quot;Date&quot;) + ylab(&quot;Maximum of Price Difference (%)&quot;)" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki" style="background-color: #272822"><code><span class="line"><span style="color: #88846F"># we&#39;ll use ggplot and fit a gam smooth line</span></span>
<span class="line"><span style="color: #F8F8F2">ggplot(tmp[price_diff </span><span style="color: #F92672">></span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">0</span><span style="color: #F8F8F2"> ], aes(</span><span style="color: #FD971F">x</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> date, </span><span style="color: #FD971F">y</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> price_diff)) </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> geom_point(</span><span style="color: #FD971F">alpha</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">0.2</span><span style="color: #F8F8F2">, </span><span style="color: #FD971F">shape</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">16</span><span style="color: #F8F8F2">, </span><span style="color: #FD971F">size</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">3</span><span style="color: #F8F8F2">, </span><span style="color: #FD971F">show.legend</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">FALSE</span><span style="color: #F8F8F2">) </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> scale_y_continuous(</span><span style="color: #FD971F">trans</span><span style="color: #F92672">=</span><span style="color: #E6DB74">&#39;log10&#39;</span><span style="color: #F8F8F2">) </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> geom_smooth(</span><span style="color: #FD971F">method</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;gam&quot;</span><span style="color: #F8F8F2">, </span><span style="color: #FD971F">formula</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> y </span><span style="color: #F92672">~</span><span style="color: #F8F8F2"> s(x, </span><span style="color: #FD971F">bs</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;cs&quot;</span><span style="color: #F8F8F2">)) </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> theme_bw() </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> xlab(</span><span style="color: #E6DB74">&quot;Date&quot;</span><span style="color: #F8F8F2">) </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> ylab(</span><span style="color: #E6DB74">&quot;Maximum of Price Difference (%)&quot;</span><span style="color: #F8F8F2">)</span></span></code></pre></div>


<div class="wp-block-image">
<figure class="aligncenter size-full"><img loading="lazy" decoding="async" width="828" height="492" loading="lazy" src="https://analyticadss.com/wp-content/uploads/2022/12/1_6-thiHBpH1oNCkiqPmFugA.webp" alt="" class="wp-image-4902" srcset="https://analyticadss.com/wp-content/uploads/2022/12/1_6-thiHBpH1oNCkiqPmFugA.webp 828w, https://analyticadss.com/wp-content/uploads/2022/12/1_6-thiHBpH1oNCkiqPmFugA-500x297.webp 500w, https://analyticadss.com/wp-content/uploads/2022/12/1_6-thiHBpH1oNCkiqPmFugA-150x89.webp 150w, https://analyticadss.com/wp-content/uploads/2022/12/1_6-thiHBpH1oNCkiqPmFugA-768x456.webp 768w" sizes="auto, (max-width: 828px) 100vw, 828px" /></figure>
</div>


<p>The regression line shows a hint of an over all downtrend from <strong>2014 </strong>to mid <strong>2016</strong>, except for an uptrend for few months in late <strong>2015</strong>. The trend seems to have gone up in mid to late <strong>2017</strong>, and again we see a downword movement in price differences starting in December of 2017. This can be seen better in the <em>box-plot</em> figure below.</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" style="font-size:.875rem;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:7.703125px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="tmp[, month_year := format(as.Date(date), &quot;%Y-%m&quot;)]
ggplot(tmp[price_diff > 0], aes(x = month_year, y = price_diff)) + geom_boxplot() + scale_y_continuous(trans='log10') + xlab(&quot;Date (Year-Month)&quot;) + ylab(&quot;Maximum of Price Difference (%)&quot;) + theme_bw() + theme(axis.text.x = element_text(angle = 90, hjust = 1))" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki" style="background-color: #272822"><code><span class="line"><span style="color: #F8F8F2">tmp[, month_year </span><span style="color: #F92672">:=</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">format</span><span style="color: #F8F8F2">(</span><span style="color: #66D9EF">as.Date</span><span style="color: #F8F8F2">(date), </span><span style="color: #E6DB74">&quot;%Y-%m&quot;</span><span style="color: #F8F8F2">)]</span></span>
<span class="line"><span style="color: #F8F8F2">ggplot(tmp[price_diff </span><span style="color: #F92672">></span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">0</span><span style="color: #F8F8F2">], aes(</span><span style="color: #FD971F">x</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> month_year, </span><span style="color: #FD971F">y</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> price_diff)) </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> geom_boxplot() </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> scale_y_continuous(</span><span style="color: #FD971F">trans</span><span style="color: #F92672">=</span><span style="color: #E6DB74">&#39;log10&#39;</span><span style="color: #F8F8F2">) </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> xlab(</span><span style="color: #E6DB74">&quot;Date (Year-Month)&quot;</span><span style="color: #F8F8F2">) </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> ylab(</span><span style="color: #E6DB74">&quot;Maximum of Price Difference (%)&quot;</span><span style="color: #F8F8F2">) </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> theme_bw() </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> theme(</span><span style="color: #FD971F">axis.text.x</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> element_text(</span><span style="color: #FD971F">angle</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">90</span><span style="color: #F8F8F2">, </span><span style="color: #FD971F">hjust</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">1</span><span style="color: #F8F8F2">))</span></span></code></pre></div>


<div class="wp-block-image">
<figure class="aligncenter size-full"><img loading="lazy" decoding="async" width="828" height="499" loading="lazy" src="https://analyticadss.com/wp-content/uploads/2022/12/1_KKc4nGtHQg6RBCxzoKkaMw.webp" alt="" class="wp-image-4903" srcset="https://analyticadss.com/wp-content/uploads/2022/12/1_KKc4nGtHQg6RBCxzoKkaMw.webp 828w, https://analyticadss.com/wp-content/uploads/2022/12/1_KKc4nGtHQg6RBCxzoKkaMw-500x301.webp 500w, https://analyticadss.com/wp-content/uploads/2022/12/1_KKc4nGtHQg6RBCxzoKkaMw-150x90.webp 150w, https://analyticadss.com/wp-content/uploads/2022/12/1_KKc4nGtHQg6RBCxzoKkaMw-768x463.webp 768w" sizes="auto, (max-width: 828px) 100vw, 828px" /></figure>
</div>


<p id="60f8">The <em>box-plot</em> figure above shows the variation of maximum differences in prices as a function of time. On the <strong>x-axis</strong> I grouped dates by month since anything less than a one-month period will result in congested figure.</p>



<p id="f970"><strong>Okay</strong>, now let’s find out which of the exchanges contribute the most to these price difference. That is, we are trying to determine which exchanges are constantly selling bitcoin higher, or lower, than the rest of the exchanges. We need to pull some numbers as below, and then we’ll make a bar plot to show the leading exchanges in each category.</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" style="font-size:.875rem;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:15.395843505859375px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="# add two columns to our data table which will contain the minimum and maximum prices
tmp[, `:=`(day_max = max(weighted_price), day_min = min(weighted_price)), by = date]
# now put only the columns we care about in a new data.table
tmp2 <- tmp[price_diff > 0 , .(date, exchange, weighted_price, day_min, day_max)]
# notice how we excluded days with no price difference
# now we only want to keep the rows with the maximum and minimum daily prices
tmp2 <- tmp2[weighted_price == day_min | weighted_price == day_max]
# now we'll add a new column designating the price as being the minimum or maximum
tmp2[, max_min := ifelse(weighted_price == day_min, &quot;min&quot;, &quot;max&quot;)]
# clean the name of the exchange
tmp2[, exchange := gsub(&quot;USD&quot;, &quot;&quot;, exchange)]
# now we'll add a new column containing the exchange name and the min_max column
tmp2[, max_min_exchange := paste(max_min, exchange, sep = &quot;-&quot;)]" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki" style="background-color: #272822"><code><span class="line"><span style="color: #88846F"># add two columns to our data table which will contain the minimum and maximum prices</span></span>
<span class="line"><span style="color: #F8F8F2">tmp[, `:=`(</span><span style="color: #FD971F">day_max</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">max</span><span style="color: #F8F8F2">(weighted_price), </span><span style="color: #FD971F">day_min</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">min</span><span style="color: #F8F8F2">(weighted_price)), </span><span style="color: #FD971F">by</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> date]</span></span>
<span class="line"><span style="color: #88846F"># now put only the columns we care about in a new data.table</span></span>
<span class="line"><span style="color: #F8F8F2">tmp2 </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> tmp[price_diff </span><span style="color: #F92672">></span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">0</span><span style="color: #F8F8F2"> , .(date, exchange, weighted_price, day_min, day_max)]</span></span>
<span class="line"><span style="color: #88846F"># notice how we excluded days with no price difference</span></span>
<span class="line"><span style="color: #88846F"># now we only want to keep the rows with the maximum and minimum daily prices</span></span>
<span class="line"><span style="color: #F8F8F2">tmp2 </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> tmp2[weighted_price </span><span style="color: #F92672">==</span><span style="color: #F8F8F2"> day_min </span><span style="color: #F92672">|</span><span style="color: #F8F8F2"> weighted_price </span><span style="color: #F92672">==</span><span style="color: #F8F8F2"> day_max]</span></span>
<span class="line"><span style="color: #88846F"># now we&#39;ll add a new column designating the price as being the minimum or maximum</span></span>
<span class="line"><span style="color: #F8F8F2">tmp2[, max_min </span><span style="color: #F92672">:=</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">ifelse</span><span style="color: #F8F8F2">(weighted_price </span><span style="color: #F92672">==</span><span style="color: #F8F8F2"> day_min, </span><span style="color: #E6DB74">&quot;min&quot;</span><span style="color: #F8F8F2">, </span><span style="color: #E6DB74">&quot;max&quot;</span><span style="color: #F8F8F2">)]</span></span>
<span class="line"><span style="color: #88846F"># clean the name of the exchange</span></span>
<span class="line"><span style="color: #F8F8F2">tmp2[, exchange </span><span style="color: #F92672">:=</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">gsub</span><span style="color: #F8F8F2">(</span><span style="color: #E6DB74">&quot;USD&quot;</span><span style="color: #F8F8F2">, </span><span style="color: #E6DB74">&quot;&quot;</span><span style="color: #F8F8F2">, exchange)]</span></span>
<span class="line"><span style="color: #88846F"># now we&#39;ll add a new column containing the exchange name and the min_max column</span></span>
<span class="line"><span style="color: #F8F8F2">tmp2[, max_min_exchange </span><span style="color: #F92672">:=</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">paste</span><span style="color: #F8F8F2">(max_min, exchange, </span><span style="color: #FD971F">sep</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;-&quot;</span><span style="color: #F8F8F2">)]</span></span></code></pre></div>



<p>In the above chunk of code we created a new table which contains the maximum and minimum prices for each day. The table also contains a categorical column showing to which exchange this <strong>max/min</strong> price belong, and if the price was a maxima or a minima. Before we plot the table above, let’s have a quick look at it.</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" style="font-size:.875rem;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:7.704864501953125px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="head(tmp2)" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki" style="background-color: #272822"><code><span class="line"><span style="color: #66D9EF">head</span><span style="color: #F8F8F2">(tmp2)</span></span></code></pre></div>



<p></p>


<div class="wp-block-image">
<figure class="aligncenter size-full"><img loading="lazy" decoding="async" width="828" height="251" loading="lazy" src="https://analyticadss.com/wp-content/uploads/2022/12/1_yKe1-J73s4kQWcCYbVKIgw.webp" alt="" class="wp-image-4904" srcset="https://analyticadss.com/wp-content/uploads/2022/12/1_yKe1-J73s4kQWcCYbVKIgw.webp 828w, https://analyticadss.com/wp-content/uploads/2022/12/1_yKe1-J73s4kQWcCYbVKIgw-500x152.webp 500w, https://analyticadss.com/wp-content/uploads/2022/12/1_yKe1-J73s4kQWcCYbVKIgw-150x45.webp 150w, https://analyticadss.com/wp-content/uploads/2022/12/1_yKe1-J73s4kQWcCYbVKIgw-768x233.webp 768w" sizes="auto, (max-width: 828px) 100vw, 828px" /></figure>
</div>


<p id="f700">The <strong>max_min_exchange</strong> column contains all the data we need, so let’s make a <strong>barplot </strong>of this variable, we’ll color the <strong>barplot </strong>by the <strong>max_min</strong> criteria shown in <strong>max_min</strong> column</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" style="font-size:.875rem;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:7.70489501953125px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="# now make a barplot 
ggplot(tmp2, aes(x = max_min_exchange, fill = max_min)) + geom_bar() + theme_bw() + theme(axis.text.x = element_text(angle = 90, hjust = 1)) + ggtitle(&quot;Exchanges with the highest and lowest price differences in Bitcoin&quot;) + ylab(&quot;Frequency&quot;) + xlab(&quot;Exchange&quot;) + scale_fill_discrete(name = &quot;BTC Price Diff. Type&quot;)" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki" style="background-color: #272822"><code><span class="line"><span style="color: #88846F"># now make a barplot </span></span>
<span class="line"><span style="color: #F8F8F2">ggplot(tmp2, aes(</span><span style="color: #FD971F">x</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> max_min_exchange, </span><span style="color: #FD971F">fill</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> max_min)) </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> geom_bar() </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> theme_bw() </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> theme(</span><span style="color: #FD971F">axis.text.x</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> element_text(</span><span style="color: #FD971F">angle</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">90</span><span style="color: #F8F8F2">, </span><span style="color: #FD971F">hjust</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">1</span><span style="color: #F8F8F2">)) </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> ggtitle(</span><span style="color: #E6DB74">&quot;Exchanges with the highest and lowest price differences in Bitcoin&quot;</span><span style="color: #F8F8F2">) </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> ylab(</span><span style="color: #E6DB74">&quot;Frequency&quot;</span><span style="color: #F8F8F2">) </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> xlab(</span><span style="color: #E6DB74">&quot;Exchange&quot;</span><span style="color: #F8F8F2">) </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> scale_fill_discrete(</span><span style="color: #FD971F">name</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;BTC Price Diff. Type&quot;</span><span style="color: #F8F8F2">)</span></span></code></pre></div>


<div class="wp-block-image">
<figure class="aligncenter size-full"><img loading="lazy" decoding="async" width="828" height="500" loading="lazy" src="https://analyticadss.com/wp-content/uploads/2022/12/1_L0E0OJPfWfCxcc1DKzVmiw.webp" alt="" class="wp-image-4905" srcset="https://analyticadss.com/wp-content/uploads/2022/12/1_L0E0OJPfWfCxcc1DKzVmiw.webp 828w, https://analyticadss.com/wp-content/uploads/2022/12/1_L0E0OJPfWfCxcc1DKzVmiw-500x302.webp 500w, https://analyticadss.com/wp-content/uploads/2022/12/1_L0E0OJPfWfCxcc1DKzVmiw-150x91.webp 150w, https://analyticadss.com/wp-content/uploads/2022/12/1_L0E0OJPfWfCxcc1DKzVmiw-768x464.webp 768w" sizes="auto, (max-width: 828px) 100vw, 828px" /></figure>
</div>


<p id="8183">This is interesting, <strong>Kraken</strong> seems to be the exchange with the most frequent highest prices for <strong>bitcoin</strong>. On the other hand, Bitstamp seems to be the one with the most frequent lowest prices among exchanges. So if you want to do <a href="https://www.investopedia.com/terms/a/arbitrage.asp" target="_blank" rel="noreferrer noopener">arbitrage</a> your best bit is to buy on <strong>Bitstamp </strong>and sell on <strong>Kraken</strong>.</p>



<p><a href="https://medium.com/tag/bitcoin?source=post_page-----9e0d1bff7c63---------------bitcoin-----------------"></a></p>



<p>Read More blogs in AnalyticaDSS Blogs here : <a href="https://analyticadss.com/blog">BLOGS</a></p>



<p>Read More blogs in Medium : <a href="https://medium.com/@aousabdo">Medium Blogs</a></p>



<p>Read More blogs in R-bloggers : <a href="https://www.r-bloggers.com/">https://www.r-bloggers.com</a></p>
<p>The post <a href="https://analyticadss.com/analyzing-cryptocurrency-markets-using-r-part-1/">Analyzing Crypto Markets using R — Part 1</a> appeared first on <a href="https://analyticadss.com">Analytica Data Science Solutions</a>.</p>
]]></content:encoded>
					
		
		
			</item>
	</channel>
</rss>
