Mapping and Analyzing Geospatial Trends: A Python Approach

By

Imonikhe Ayeni

---

INTRODUCTION

This project, titled "Mapping and Analyzing Geospatial Trends: A Python Approach" explores the integration of geospatial data visualization and sentiment analysis using Python. The work is organized into three distinct tasks, each focusing on a specific aspect of geospatial data handling:

  • Geospatial Visualization:

This task demonstrates the use of Python-based geospatial visualization tools, specifically GeoPandas, to analyze and present real-world datasets. GeoPandas facilitates the creation of both static and interactive maps, providing a comprehensive view of the geospatial information.

  • Geospatial Data Analysis:

This task delves into professional methods for analyzing geospatial datasets. It utilizes the World Total Population dataset from the World Bank and the World Cereal Yield dataset. The objective is to identify significant relationships between the population of different countries and their cereal yield, showcasing the potential correlations and insights that can be derived from such analyses.

  • Geospatial Sentiment Analysis:

In this task, geospatial sentiment analysis is applied to Twitter (now X) data using the Python library, TextBlob. The dataset comprises tweets relevant to cryptocurrency. This task aims to extract and map sentiments expressed in these tweets, highlighting the geographical distribution of opinions and trends in the cryptocurrency domain.

OBJECTIVES

The objectives of this project are as follows:

  • Identify the concepts Underlying Geospatial Analysis:

Identify the concepts underlying geospatial analysis through the application of Python-based tools and libraries.

  • Apply Social Analytics Techniques:

Implement appropriate techniques to analyze social information, particularly through the sentiment analysis of Twitter data related to cryptocurrency.

  • Design and Implement a Geospatial Analysis Framework:

Develop, prototype, and execute a comprehensive framework for geospatial analysis, integrating various datasets and visualization methods to derive meaningful insights.

  • Identify Relationships Between Geospatial Datasets:

Investigate and identify significant relationships between different geospatial datasets, such as the correlation between the population of different countries and their cereal yield.

  • Enhance Data Visualization Skills:

Improve skills in creating both static and interactive geospatial visualizations, making complex data more accessible and understandable to a diverse audience.

METHODOLOGY

The methodology for this project involves the following steps, corresponding to each of the three main tasks:

  1. Geospatial Visualization
  • Data Collection:

Obtain a real-world dataset suitable for geospatial analysis.

  • Data Preparation:

Clean and preprocess the dataset to ensure it is ready for analysis using GeoPandas.

  • Visualization:

Use GeoPandas' .plot() and .explore() methods to create both static and interactive maps. Employ additional libraries such as Folium and Mapclassify to enhance the interactivity and classification of the maps.

  • Presentation:

Display the static maps for quick insights and interactive maps for detailed exploration, allowing users to hover, zoom, and interact with the geospatial data.

  1. Geospatial Data Analysis
  • Data Collection:

Source the World Total Population dataset from the World Bank and the World Cereal Yield dataset.

  • Data Preparation:

Clean and preprocess the datasets to ensure compatibility for analysis.

  • Analysis:

Use statistical and geospatial analysis techniques to explore the relationship between the population of different countries and their cereal yield. Employ visualization tools to present the findings in a clear and interpretable manner.

  • Interpretation:

Analyze the results to identify significant correlations or trends, providing insights into the potential relationship between population and cereal yield.

  1. Geospatial Sentiment Analysis
  • Data Collection:

Collect tweets relevant to cryptocurrency from Twitter (now X).

  • Data Preparation:

Clean and preprocess the tweet dataset, including text normalization and location extraction.

  • Sentiment Analysis:

Use the TextBlob library to perform sentiment analysis on the tweets, categorizing them into positive, negative, and neutral sentiments.

  • Geospatial Mapping:

Map the sentiments geographically using GeoPandas to visualize the distribution of opinions and trends across different regions.

  • Interpretation:

Analyze the mapped sentiments to identify regional trends and insights related to cryptocurrency discussions on Twitter.

My first step will be to install my libraries. Notice the installation of folium and mapclassify they will help with interactive maps.

In [5]:
!pip install geopandas folium matplotlib mapclassify
Collecting geopandas
  Downloading geopandas-1.0.1-py3-none-any.whl.metadata (2.2 kB)
Collecting folium
  Downloading folium-0.17.0-py2.py3-none-any.whl.metadata (3.8 kB)
Requirement already satisfied: matplotlib in c:\users\user\anaconda3\lib\site-packages (3.8.4)
Collecting mapclassify
  Downloading mapclassify-2.8.0-py3-none-any.whl.metadata (2.8 kB)
Requirement already satisfied: numpy>=1.22 in c:\users\user\anaconda3\lib\site-packages (from geopandas) (1.26.4)
Collecting pyogrio>=0.7.2 (from geopandas)
  Downloading pyogrio-0.9.0-cp312-cp312-win_amd64.whl.metadata (3.9 kB)
Requirement already satisfied: packaging in c:\users\user\anaconda3\lib\site-packages (from geopandas) (23.2)
Requirement already satisfied: pandas>=1.4.0 in c:\users\user\anaconda3\lib\site-packages (from geopandas) (2.2.2)
Collecting pyproj>=3.3.0 (from geopandas)
  Downloading pyproj-3.6.1-cp312-cp312-win_amd64.whl.metadata (31 kB)
Collecting shapely>=2.0.0 (from geopandas)
  Downloading shapely-2.0.5-cp312-cp312-win_amd64.whl.metadata (7.2 kB)
Collecting branca>=0.6.0 (from folium)
  Downloading branca-0.7.2-py3-none-any.whl.metadata (1.5 kB)
Requirement already satisfied: jinja2>=2.9 in c:\users\user\anaconda3\lib\site-packages (from folium) (3.1.4)
Requirement already satisfied: requests in c:\users\user\anaconda3\lib\site-packages (from folium) (2.32.2)
Requirement already satisfied: xyzservices in c:\users\user\anaconda3\lib\site-packages (from folium) (2022.9.0)
Requirement already satisfied: contourpy>=1.0.1 in c:\users\user\anaconda3\lib\site-packages (from matplotlib) (1.2.0)
Requirement already satisfied: cycler>=0.10 in c:\users\user\anaconda3\lib\site-packages (from matplotlib) (0.11.0)
Requirement already satisfied: fonttools>=4.22.0 in c:\users\user\anaconda3\lib\site-packages (from matplotlib) (4.51.0)
Requirement already satisfied: kiwisolver>=1.3.1 in c:\users\user\anaconda3\lib\site-packages (from matplotlib) (1.4.4)
Requirement already satisfied: pillow>=8 in c:\users\user\anaconda3\lib\site-packages (from matplotlib) (10.3.0)
Requirement already satisfied: pyparsing>=2.3.1 in c:\users\user\anaconda3\lib\site-packages (from matplotlib) (3.0.9)
Requirement already satisfied: python-dateutil>=2.7 in c:\users\user\anaconda3\lib\site-packages (from matplotlib) (2.9.0.post0)
Requirement already satisfied: networkx>=2.7 in c:\users\user\anaconda3\lib\site-packages (from mapclassify) (3.2.1)
Requirement already satisfied: scikit-learn>=1.0 in c:\users\user\anaconda3\lib\site-packages (from mapclassify) (1.4.2)
Requirement already satisfied: scipy>=1.8 in c:\users\user\anaconda3\lib\site-packages (from mapclassify) (1.13.1)
Requirement already satisfied: MarkupSafe>=2.0 in c:\users\user\anaconda3\lib\site-packages (from jinja2>=2.9->folium) (2.1.3)
Requirement already satisfied: pytz>=2020.1 in c:\users\user\anaconda3\lib\site-packages (from pandas>=1.4.0->geopandas) (2024.1)
Requirement already satisfied: tzdata>=2022.7 in c:\users\user\anaconda3\lib\site-packages (from pandas>=1.4.0->geopandas) (2023.3)
Requirement already satisfied: certifi in c:\users\user\anaconda3\lib\site-packages (from pyogrio>=0.7.2->geopandas) (2024.7.4)
Requirement already satisfied: six>=1.5 in c:\users\user\anaconda3\lib\site-packages (from python-dateutil>=2.7->matplotlib) (1.16.0)
Requirement already satisfied: joblib>=1.2.0 in c:\users\user\anaconda3\lib\site-packages (from scikit-learn>=1.0->mapclassify) (1.4.2)
Requirement already satisfied: threadpoolctl>=2.0.0 in c:\users\user\anaconda3\lib\site-packages (from scikit-learn>=1.0->mapclassify) (2.2.0)
Requirement already satisfied: charset-normalizer<4,>=2 in c:\users\user\anaconda3\lib\site-packages (from requests->folium) (2.0.4)
Requirement already satisfied: idna<4,>=2.5 in c:\users\user\anaconda3\lib\site-packages (from requests->folium) (3.7)
Requirement already satisfied: urllib3<3,>=1.21.1 in c:\users\user\anaconda3\lib\site-packages (from requests->folium) (2.2.2)
Downloading geopandas-1.0.1-py3-none-any.whl (323 kB)
   ---------------------------------------- 0.0/323.6 kB ? eta -:--:--
   ---------------------------------------- 323.6/323.6 kB 9.8 MB/s eta 0:00:00
Downloading folium-0.17.0-py2.py3-none-any.whl (108 kB)
   ---------------------------------------- 0.0/108.4 kB ? eta -:--:--
   ---------------------------------------- 108.4/108.4 kB 6.1 MB/s eta 0:00:00
Downloading mapclassify-2.8.0-py3-none-any.whl (58 kB)
   ---------------------------------------- 0.0/58.9 kB ? eta -:--:--
   ---------------------------------------- 58.9/58.9 kB 3.0 MB/s eta 0:00:00
Downloading branca-0.7.2-py3-none-any.whl (25 kB)
Downloading pyogrio-0.9.0-cp312-cp312-win_amd64.whl (15.9 MB)
   ---------------------------------------- 0.0/15.9 MB ? eta -:--:--
   --- ------------------------------------ 1.2/15.9 MB 25.7 MB/s eta 0:00:01
   ------ --------------------------------- 2.6/15.9 MB 28.3 MB/s eta 0:00:01
   ---------- ----------------------------- 4.3/15.9 MB 30.7 MB/s eta 0:00:01
   -------------- ------------------------- 5.6/15.9 MB 29.9 MB/s eta 0:00:01
   ----------------- ---------------------- 6.9/15.9 MB 29.5 MB/s eta 0:00:01
   -------------------- ------------------- 8.2/15.9 MB 29.0 MB/s eta 0:00:01
   ----------------------- ---------------- 9.5/15.9 MB 28.8 MB/s eta 0:00:01
   -------------------------- ------------- 10.6/15.9 MB 29.7 MB/s eta 0:00:01
   ------------------------------ --------- 12.0/15.9 MB 28.4 MB/s eta 0:00:01
   -------------------------------- ------- 13.0/15.9 MB 28.4 MB/s eta 0:00:01
   ------------------------------------ --- 14.3/15.9 MB 26.2 MB/s eta 0:00:01
   ---------------------------------------  15.9/15.9 MB 27.3 MB/s eta 0:00:01
   ---------------------------------------  15.9/15.9 MB 27.3 MB/s eta 0:00:01
   ---------------------------------------  15.9/15.9 MB 27.3 MB/s eta 0:00:01
   ---------------------------------------- 15.9/15.9 MB 21.1 MB/s eta 0:00:00
Downloading pyproj-3.6.1-cp312-cp312-win_amd64.whl (6.1 MB)
   ---------------------------------------- 0.0/6.1 MB ? eta -:--:--
   --------- ------------------------------ 1.5/6.1 MB 47.2 MB/s eta 0:00:01
   ----------------- ---------------------- 2.7/6.1 MB 34.9 MB/s eta 0:00:01
   --------------------------- ------------ 4.1/6.1 MB 33.0 MB/s eta 0:00:01
   ---------------------------------- ----- 5.3/6.1 MB 30.7 MB/s eta 0:00:01
   ---------------------------------------  6.1/6.1 MB 32.4 MB/s eta 0:00:01
   ---------------------------------------- 6.1/6.1 MB 24.3 MB/s eta 0:00:00
Downloading shapely-2.0.5-cp312-cp312-win_amd64.whl (1.4 MB)
   ---------------------------------------- 0.0/1.4 MB ? eta -:--:--
   ----------------------------------- ---- 1.3/1.4 MB 40.6 MB/s eta 0:00:01
   ---------------------------------------- 1.4/1.4 MB 23.1 MB/s eta 0:00:00
Installing collected packages: shapely, pyproj, pyogrio, branca, mapclassify, geopandas, folium
Successfully installed branca-0.7.2 folium-0.17.0 geopandas-1.0.1 mapclassify-2.8.0 pyogrio-0.9.0 pyproj-3.6.1 shapely-2.0.5

I am going to be using interactive map in this work, thus I am importing folium and mapclassify. Pandas will help me setup and manage a dataframe. Geopandas will give me the geometry column this is necessary for mapping activities. Mapplotlib will assist me in visualisation. GeoPandas is a Python library essential for working with vector data. It is built on the pandas library. Crickard, P., (2018)

In [2]:
import geopandas  as gpd    # importing necassary dependancies
import pandas as pd
import matplotlib.pyplot as plt
import folium
import mapclassify

Task 1.1: Application of Python-based geospatial visualisation tool (e.g., GeoPandas) on a realworld dataset¶

INSTRUCTION

This task requires you to use the dataset, cereal yield. Use a Python-based visualisation tool (such as GeoPandas) to plot a set of choropleth maps representing the world cereal yield (kg per hectare) for the years 2019 and 2020 respectively. The solution should be in a Jupyter notebook (.ipynb), wherein all the functions, libraries and coding steps should be explained in a lucid manner. Major steps for generating the choropleths would typically involve, importing the datasets using appropriate Python libraries, data cleaning, geospatial operations, and plotting. The Jupyter Notebook should be able to reproduce the choropleth maps without any error.

In [3]:
df_cereal_yield = pd.read_csv(r'API_AG.YLD.CREL.KG_DS2_en_csv_v2_5734359.csv', skiprows=4) # Reading my data set with pandas library. I skipped some rows as they had unreadable or no value
In [4]:
df_cereal_yield.tail() #inspect dataset
Out[4]:
Country Name Country Code Indicator Name Indicator Code 1960 1961 1962 1963 1964 1965 ... 2014 2015 2016 2017 2018 2019 2020 2021 2022 Unnamed: 67
261 Kosovo XKX Cereal yield (kg per hectare) AG.YLD.CREL.KG NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
262 Yemen, Rep. YEM Cereal yield (kg per hectare) AG.YLD.CREL.KG NaN 782.5 780.7 771.8 776.1 773.6 ... 962.7 784.2 687.0 699.0 682.8 864.9 861.1 791.8 NaN NaN
263 South Africa ZAF Cereal yield (kg per hectare) AG.YLD.CREL.KG NaN 1099.1 1142.1 1128.0 913.9 911.4 ... 4899.6 3348.4 3623.1 5331.8 4652.1 4101.4 5120.6 5124.7 NaN NaN
264 Zambia ZMB Cereal yield (kg per hectare) AG.YLD.CREL.KG NaN 822.2 801.4 706.9 788.9 823.5 ... 2774.9 3026.4 2432.2 2489.9 2168.1 2400.4 2481.6 2525.0 NaN NaN
265 Zimbabwe ZWE Cereal yield (kg per hectare) AG.YLD.CREL.KG NaN 919.7 905.9 822.5 820.5 930.8 ... 831.4 557.5 435.1 1203.3 1254.3 748.0 1148.6 1545.2 NaN NaN

5 rows × 68 columns

In [5]:
#Inspect Dataset
In [6]:
df_cereal_yield.columns
Out[6]:
Index(['Country Name', 'Country Code', 'Indicator Name', 'Indicator Code',
       '1960', '1961', '1962', '1963', '1964', '1965', '1966', '1967', '1968',
       '1969', '1970', '1971', '1972', '1973', '1974', '1975', '1976', '1977',
       '1978', '1979', '1980', '1981', '1982', '1983', '1984', '1985', '1986',
       '1987', '1988', '1989', '1990', '1991', '1992', '1993', '1994', '1995',
       '1996', '1997', '1998', '1999', '2000', '2001', '2002', '2003', '2004',
       '2005', '2006', '2007', '2008', '2009', '2010', '2011', '2012', '2013',
       '2014', '2015', '2016', '2017', '2018', '2019', '2020', '2021', '2022',
       'Unnamed: 67'],
      dtype='object')
In [7]:
df_cereal_yield.isna().sum()
Out[7]:
Country Name        0
Country Code        0
Indicator Name      0
Indicator Code      0
1960              266
                 ... 
2019               39
2020               39
2021               39
2022              266
Unnamed: 67       266
Length: 68, dtype: int64
In [8]:
#extract useful columns accoring to the instruction
useful_columns = ['Country Name','Country Code','2019', '2020']
In [9]:
cereal_yield = df_cereal_yield[useful_columns]
In [10]:
cereal_yield
Out[10]:
Country Name Country Code 2019 2020
0 Aruba ABW NaN NaN
1 Africa Eastern and Southern AFE 1717.894885 1838.762607
2 Afghanistan AFG 2113.400000 1979.900000
3 Africa Western and Central AFW 1343.462790 1381.643141
4 Angola AGO 958.800000 992.500000
... ... ... ... ...
261 Kosovo XKX NaN NaN
262 Yemen, Rep. YEM 864.900000 861.100000
263 South Africa ZAF 4101.400000 5120.600000
264 Zambia ZMB 2400.400000 2481.600000
265 Zimbabwe ZWE 748.000000 1148.600000

266 rows × 4 columns

In [11]:
cereal_yield.isna().sum() #view null values
Out[11]:
Country Name     0
Country Code     0
2019            39
2020            39
dtype: int64

I am going to drop columns with null values. I cannot fill the null values with mean or mode, because each country has its perculiarities.

In [12]:
cereal_yield.dropna(subset =['2019', '2020'], inplace= True)
C:\Users\User\AppData\Local\Temp\ipykernel_8832\632718052.py:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cereal_yield.dropna(subset =['2019', '2020'], inplace= True)
In [13]:
cereal_yield
Out[13]:
Country Name Country Code 2019 2020
1 Africa Eastern and Southern AFE 1717.894885 1838.762607
2 Afghanistan AFG 2113.400000 1979.900000
3 Africa Western and Central AFW 1343.462790 1381.643141
4 Angola AGO 958.800000 992.500000
5 Albania ALB 5038.200000 5209.200000
... ... ... ... ...
259 World WLD 4125.447810 4116.427597
262 Yemen, Rep. YEM 864.900000 861.100000
263 South Africa ZAF 4101.400000 5120.600000
264 Zambia ZMB 2400.400000 2481.600000
265 Zimbabwe ZWE 748.000000 1148.600000

227 rows × 4 columns

In [14]:
cereal_yield.isna().sum() #all null values removed
Out[14]:
Country Name    0
Country Code    0
2019            0
2020            0
dtype: int64

I will now bring in geopandas dataset (naturaleath_lowres) and merge it with cereal yield

In [15]:
#check available geopandas datasets
gpd.datasets.available 
Out[15]:
['naturalearth_cities', 'naturalearth_lowres', 'nybb']
In [16]:
#use naturalearth lowres
earth= gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
C:\Users\User\AppData\Local\Temp\ipykernel_8832\3099934314.py:2: FutureWarning: The geopandas.dataset module is deprecated and will be removed in GeoPandas 1.0. You can get the original 'naturalearth_lowres' data from https://www.naturalearthdata.com/downloads/110m-cultural-vectors/.
  earth= gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
In [17]:
earth
Out[17]:
pop_est continent name iso_a3 gdp_md_est geometry
0 889953.0 Oceania Fiji FJI 5496 MULTIPOLYGON (((180.00000 -16.06713, 180.00000...
1 58005463.0 Africa Tanzania TZA 63177 POLYGON ((33.90371 -0.95000, 34.07262 -1.05982...
2 603253.0 Africa W. Sahara ESH 907 POLYGON ((-8.66559 27.65643, -8.66512 27.58948...
3 37589262.0 North America Canada CAN 1736425 MULTIPOLYGON (((-122.84000 49.00000, -122.9742...
4 328239523.0 North America United States of America USA 21433226 MULTIPOLYGON (((-122.84000 49.00000, -120.0000...
... ... ... ... ... ... ...
172 6944975.0 Europe Serbia SRB 51475 POLYGON ((18.82982 45.90887, 18.82984 45.90888...
173 622137.0 Europe Montenegro MNE 5542 POLYGON ((20.07070 42.58863, 19.80161 42.50009...
174 1794248.0 Europe Kosovo -99 7926 POLYGON ((20.59025 41.85541, 20.52295 42.21787...
175 1394973.0 North America Trinidad and Tobago TTO 24269 POLYGON ((-61.68000 10.76000, -61.10500 10.890...
176 11062113.0 Africa S. Sudan SSD 11998 POLYGON ((30.83385 3.50917, 29.95350 4.17370, ...

177 rows × 6 columns

In [18]:
#filter out useful columns
earth= earth[["iso_a3","geometry"]]
earth
Out[18]:
iso_a3 geometry
0 FJI MULTIPOLYGON (((180.00000 -16.06713, 180.00000...
1 TZA POLYGON ((33.90371 -0.95000, 34.07262 -1.05982...
2 ESH POLYGON ((-8.66559 27.65643, -8.66512 27.58948...
3 CAN MULTIPOLYGON (((-122.84000 49.00000, -122.9742...
4 USA MULTIPOLYGON (((-122.84000 49.00000, -120.0000...
... ... ...
172 SRB POLYGON ((18.82982 45.90887, 18.82984 45.90888...
173 MNE POLYGON ((20.07070 42.58863, 19.80161 42.50009...
174 -99 POLYGON ((20.59025 41.85541, 20.52295 42.21787...
175 TTO POLYGON ((-61.68000 10.76000, -61.10500 10.890...
176 SSD POLYGON ((30.83385 3.50917, 29.95350 4.17370, ...

177 rows × 2 columns

In [19]:
#rename "iso_a3" to Country Code to create a common column with cereal_yield dataframe 
earth= earth.rename(columns={"iso_a3":'Country Code'})
In [20]:
earth #Inspect
Out[20]:
Country Code geometry
0 FJI MULTIPOLYGON (((180.00000 -16.06713, 180.00000...
1 TZA POLYGON ((33.90371 -0.95000, 34.07262 -1.05982...
2 ESH POLYGON ((-8.66559 27.65643, -8.66512 27.58948...
3 CAN MULTIPOLYGON (((-122.84000 49.00000, -122.9742...
4 USA MULTIPOLYGON (((-122.84000 49.00000, -120.0000...
... ... ...
172 SRB POLYGON ((18.82982 45.90887, 18.82984 45.90888...
173 MNE POLYGON ((20.07070 42.58863, 19.80161 42.50009...
174 -99 POLYGON ((20.59025 41.85541, 20.52295 42.21787...
175 TTO POLYGON ((-61.68000 10.76000, -61.10500 10.890...
176 SSD POLYGON ((30.83385 3.50917, 29.95350 4.17370, ...

177 rows × 2 columns

In [21]:
#we shall merge earth data frame and cereal_yield dataframe and call the eventual dataframe earth_yield
earth_yield= earth.merge(cereal_yield, on='Country Code')
In [22]:
earth_yield
Out[22]:
Country Code geometry Country Name 2019 2020
0 FJI MULTIPOLYGON (((180.00000 -16.06713, 180.00000... Fiji 3353.4 3665.7
1 TZA POLYGON ((33.90371 -0.95000, 34.07262 -1.05982... Tanzania 1883.3 1698.4
2 CAN MULTIPOLYGON (((-122.84000 49.00000, -122.9742... Canada 4010.8 4095.4
3 USA MULTIPOLYGON (((-122.84000 49.00000, -120.0000... United States 8006.1 8145.3
4 KAZ POLYGON ((87.35997 49.21498, 86.59878 48.54918... Kazakhstan 1154.5 1288.6
... ... ... ... ... ...
162 MKD POLYGON ((22.38053 42.32026, 22.88137 41.99930... North Macedonia 3554.0 3664.2
163 SRB POLYGON ((18.82982 45.90887, 18.82984 45.90888... Serbia 6126.9 6559.4
164 MNE POLYGON ((20.07070 42.58863, 19.80161 42.50009... Montenegro 3166.8 3260.0
165 TTO POLYGON ((-61.68000 10.76000, -61.10500 10.890... Trinidad and Tobago 1529.7 1520.3
166 SSD POLYGON ((30.83385 3.50917, 29.95350 4.17370, ... South Sudan 880.5 885.5

167 rows × 5 columns

I will sort my dataframe in descending order to show top five countries with highest cereal yield and bottom five countries with lowest cereal yield

In [23]:
earth_yield.sort_values(by='2019', ascending= False)
Out[23]:
Country Code geometry Country Name 2019 2020
79 ARE POLYGON ((51.57952 24.24550, 51.75744 24.29407... United Arab Emirates 23842.3 25980.3
81 KWT POLYGON ((47.97452 29.97582, 48.18319 29.53448... Kuwait 17625.4 13393.1
83 OMN MULTIPOLYGON (((55.20834 22.70833, 55.23449 23... Oman 13291.0 18835.1
124 BEL POLYGON ((6.15666 50.80372, 6.04307 50.12805, ... Belgium 8988.5 8430.6
125 NLD POLYGON ((6.90514 53.48216, 7.09205 53.14404, ... Netherlands 8654.2 7919.9
... ... ... ... ... ...
84 VUT MULTIPOLYGON (((167.21680 -15.89185, 167.84488... Vanuatu 618.2 609.7
13 SDN POLYGON ((24.56737 8.22919, 23.80581 8.66632, ... Sudan 552.9 489.6
11 SOM POLYGON ((41.58513 -1.68325, 40.99300 -0.85829... Somalia 522.2 502.5
51 NER POLYGON ((14.85130 22.86295, 15.09689 21.30852... Niger 501.5 560.2
75 GMB POLYGON ((-16.71373 13.59496, -15.62460 13.623... Gambia, The 443.8 501.9

167 rows × 5 columns

In [24]:
earth_yield.sort_values(by='2020', ascending= False)
Out[24]:
Country Code geometry Country Name 2019 2020
79 ARE POLYGON ((51.57952 24.24550, 51.75744 24.29407... United Arab Emirates 23842.3 25980.3
83 OMN MULTIPOLYGON (((55.20834 22.70833, 55.23449 23... Oman 13291.0 18835.1
81 KWT POLYGON ((47.97452 29.97582, 48.18319 29.53448... Kuwait 17625.4 13393.1
131 NZL MULTIPOLYGON (((176.88582 -40.06598, 176.50802... New Zealand 8205.0 9039.3
129 NCL POLYGON ((165.77999 -21.08000, 166.59999 -21.7... New Caledonia 7031.1 8689.9
... ... ... ... ... ...
11 SOM POLYGON ((41.58513 -1.68325, 40.99300 -0.85829... Somalia 522.2 502.5
75 GMB POLYGON ((-16.71373 13.59496, -15.62460 13.623... Gambia, The 443.8 501.9
13 SDN POLYGON ((24.56737 8.22919, 23.80581 8.66632, ... Sudan 552.9 489.6
22 LSO POLYGON ((28.97826 -28.95560, 29.32517 -29.257... Lesotho 695.5 433.3
46 NAM POLYGON ((19.89577 -24.76779, 19.89473 -28.461... Namibia 731.5 429.5

167 rows × 5 columns

Fig 1: World cereal yield (kg per hectare) for the year 2019 with .plot

In [25]:
fig, ax = plt.subplots(figsize=(20,16))
earth.plot(ax=ax, color= "black")
earth_yield.plot(ax=ax, column="2019", legend=True, legend_kwds={"label": "World cereal yield (kg per hectare) for the year 2019","orientation":"horizontal"}, cmap='Set1')
plt.show()
No description has been provided for this image
In [26]:
fig, ax = plt.subplots(figsize=(20,16))
earth.plot(ax=ax, color= "black")
earth_yield.plot(ax=ax, column="2020", legend=True, legend_kwds={"label": "World cereal yield (kg per hectare) for the year 2020","orientation":"horizontal"}, cmap='Set1')
plt.show()
No description has been provided for this image

I am now going to use .explore for my visualisation. The motive behind using this interactive map in explaned on methodoloy and the libraries to install to use .explore are also highlighted, with reference(s) provided.

In [27]:
earth_yield.explore(column='2019',  # make choropleth based on "2019" column
    tooltip=["Country Name",'2019'],  # show "country name and 2019 value" in tooltip (on hover)
    popup=True,  # show all values in popup (on click)
    tiles="openstreetmap",  # I shall use "openstreetmap" tiles
    cmap="Set1",  # I shall use "Set1" matplotlib colormap
    legend=True,                
    style_kwds=dict(color="black"), # use black outline
    zoom_control=True,
                   )                
Out[27]:
Make this Notebook Trusted to load map: File -> Trust Notebook
In [28]:
earth_yield.explore(column='2019',  # make choropleth based on "2019" column
    tooltip=["Country Name",'2019'],  # show "country name and 2019 value" in tooltip (on hover)
    popup=True,  # show all values in popup (on click)
    tiles="",  # I shall use "CartoDB positron" tiles
    cmap="Set1",  # I shall use "Set1" matplotlib colormap
    legend=True,                
    style_kwds=dict(color="black"), # use black outline
    zoom_control=True,
                   )    
Out[28]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Brief Explanation:¶

From the tables and maps, it is obvious that:

  1. The top three countries with the highest cereal yield are in Asia for 2019 and 2020.
  2. The United Arab Emirates tops the list for both 2019 and 2020 with 23842.3 and 25980.3kg per hectare, respectively.
  3. It will be observed that the UAE grew its yield between 2019 and 2020.
  4. The 4th and 5th countries are in Europe: Belgium at 8988.5kg per hectare and the Netherlands at 8654.2kg per hectare, respectively, for 2019.
  5. Belgium and the Netherlands dropped in cereal yield between 2019 and 2020 and lost their position to New Zealand and New Caledonia.
  6. Policymakers may want to investigate why they dropped and devise measures to make cereal yields sustainable in Belgium and the Netherlands.
  7. African countries are sitting at the bottom for both 2019 and 2020.**

Task 1.2: Analysis of geospatial datasets¶

In this task, you are required to use one more dataset, the world's total population (source: World Bank) in addition to the cereal yield dataset used in the previous task. Both datasets are available on Moodle under the Assessment folder. All the choropleths and plots must be generated using appropriate Python-based tool

In [29]:
df_total_pop = pd.read_csv(r"API_SP.POP.TOTL_DS2_en_csv_v2_4485025.csv", skiprows=4) # Read population dataframe
In [30]:
df_total_pop[:3] #Inspect dataframe
Out[30]:
Country Name Country Code Indicator Name Indicator Code 1960 1961 1962 1963 1964 1965 ... 2013 2014 2015 2016 2017 2018 2019 2020 2021 Unnamed: 66
0 Aruba ABW Population, total SP.POP.TOTL 54208.0 55434.0 56234.0 56699.0 57029.0 57357.0 ... 103165.0 103776.0 104339.0 104865.0 105361.0 105846.0 106310.0 106766.0 107195.0 NaN
1 Africa Eastern and Southern AFE Population, total SP.POP.TOTL 130836765.0 134159786.0 137614644.0 141202036.0 144920186.0 148769974.0 ... 562601578.0 578075373.0 593871847.0 609978946.0 626392880.0 643090131.0 660046272.0 677243299.0 694665117.0 NaN
2 Afghanistan AFG Population, total SP.POP.TOTL 8996967.0 9169406.0 9351442.0 9543200.0 9744772.0 9956318.0 ... 32269592.0 33370804.0 34413603.0 35383028.0 36296111.0 37171922.0 38041757.0 38928341.0 39835428.0 NaN

3 rows × 67 columns

In [31]:
df_total_pop.columns
Out[31]:
Index(['Country Name', 'Country Code', 'Indicator Name', 'Indicator Code',
       '1960', '1961', '1962', '1963', '1964', '1965', '1966', '1967', '1968',
       '1969', '1970', '1971', '1972', '1973', '1974', '1975', '1976', '1977',
       '1978', '1979', '1980', '1981', '1982', '1983', '1984', '1985', '1986',
       '1987', '1988', '1989', '1990', '1991', '1992', '1993', '1994', '1995',
       '1996', '1997', '1998', '1999', '2000', '2001', '2002', '2003', '2004',
       '2005', '2006', '2007', '2008', '2009', '2010', '2011', '2012', '2013',
       '2014', '2015', '2016', '2017', '2018', '2019', '2020', '2021',
       'Unnamed: 66'],
      dtype='object')

My next step will be to filter out all columns that will be useful to answer all questions for this task

In [32]:
necessary_columns = ['Country Name','Country Code', '2010', '2011', '2012', '2013',
       '2014', '2015', '2016', '2017', '2018', '2019', '2020', '2021'] 
In [33]:
total_pop=df_total_pop[necessary_columns]
In [34]:
total_pop.isna().sum()#I shall ignore the null values in this case as they have no direct impact on my work
Out[34]:
Country Name    0
Country Code    0
2010            1
2011            1
2012            2
2013            2
2014            2
2015            2
2016            2
2017            2
2018            2
2019            2
2020            2
2021            2
dtype: int64
In [35]:
total_pop[:2] #inspect dataframe
Out[35]:
Country Name Country Code 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
0 Aruba ABW 101665.0 102050.0 102565.0 103165.0 103776.0 104339.0 104865.0 105361.0 105846.0 106310.0 106766.0 107195.0
1 Africa Eastern and Southern AFE 518468229.0 532760424.0 547482863.0 562601578.0 578075373.0 593871847.0 609978946.0 626392880.0 643090131.0 660046272.0 677243299.0 694665117.0
In [36]:
total_pop.columns #confirming I have picked all useful columns
Out[36]:
Index(['Country Name', 'Country Code', '2010', '2011', '2012', '2013', '2014',
       '2015', '2016', '2017', '2018', '2019', '2020', '2021'],
      dtype='object')
In [37]:
cereal_yield2= df_cereal_yield[necessary_columns] # I have recall my df_cereal_yield dataframe from Task 1, I will extract the necessary columns and strore in a fresh data frame
In [38]:
cereal_yield2[:2]
Out[38]:
Country Name Country Code 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
0 Aruba ABW NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 Africa Eastern and Southern AFE 1553.204407 1492.710513 1650.256751 1533.952962 1636.00973 1616.362162 1490.807738 1764.116707 1728.295922 1717.894885 1838.762607 1840.899744

My next step will be to merge the two dataframe together, I have given them suffxies to avoid confusion

In [39]:
pop_yield = cereal_yield2.merge(total_pop, on='Country Code', suffixes=(['_yield','_pop']))
In [40]:
pop_yield[:3]
Out[40]:
Country Name_yield Country Code 2010_yield 2011_yield 2012_yield 2013_yield 2014_yield 2015_yield 2016_yield 2017_yield ... 2012_pop 2013_pop 2014_pop 2015_pop 2016_pop 2017_pop 2018_pop 2019_pop 2020_pop 2021_pop
0 Aruba ABW NaN NaN NaN NaN NaN NaN NaN NaN ... 102565.0 103165.0 103776.0 104339.0 104865.0 105361.0 105846.0 106310.0 106766.0 107195.0
1 Africa Eastern and Southern AFE 1553.204407 1492.710513 1650.256751 1533.952962 1636.00973 1616.362162 1490.807738 1764.116707 ... 547482863.0 562601578.0 578075373.0 593871847.0 609978946.0 626392880.0 643090131.0 660046272.0 677243299.0 694665117.0
2 Afghanistan AFG 2011.100000 1659.900000 2029.600000 2048.500000 2017.50000 2132.200000 1980.400000 2022.500000 ... 31161378.0 32269592.0 33370804.0 34413603.0 35383028.0 36296111.0 37171922.0 38041757.0 38928341.0 39835428.0

3 rows × 27 columns

Next step is to recall my earth dataframe and merge with pop_yield dataframe (containing population and cereal yield). Merger will be on thier common country code column

In [41]:
earth_yield2= earth.merge(pop_yield, on='Country Code', 
)
In [42]:
earth_yield2
Out[42]:
Country Code geometry Country Name_yield 2010_yield 2011_yield 2012_yield 2013_yield 2014_yield 2015_yield 2016_yield ... 2012_pop 2013_pop 2014_pop 2015_pop 2016_pop 2017_pop 2018_pop 2019_pop 2020_pop 2021_pop
0 FJI MULTIPOLYGON (((180.00000 -16.06713, 180.00000... Fiji 2871.1 2821.3 2423.4 2999.5 4475.4 3001.0 3000.9 ... 865065.0 865602.0 866447.0 868632.0 872406.0 877460.0 883490.0 889955.0 896444.0 902899.0
1 TZA POLYGON ((33.90371 -0.95000, 34.07262 -1.05982... Tanzania 1647.9 1390.4 1314.8 1418.0 1524.8 1449.4 1570.4 ... 47053033.0 48483132.0 49960563.0 51482638.0 53049231.0 54660345.0 56313444.0 58005461.0 59734213.0 61498438.0
2 CAN MULTIPOLYGON (((-122.84000 49.00000, -122.9742... Canada 3501.1 3552.4 3456.3 4160.4 3647.0 3673.0 4239.1 ... 34714222.0 35082954.0 35437435.0 35702908.0 36109487.0 36545236.0 37065084.0 37601230.0 38037204.0 38246108.0
3 USA MULTIPOLYGON (((-122.84000 49.00000, -120.0000... United States 6978.1 6803.5 5911.9 7300.9 7638.1 7430.1 8614.2 ... 313877662.0 316059947.0 318386329.0 320738994.0 323071755.0 325122128.0 326838199.0 328329953.0 331501080.0 331893745.0
4 KAZ POLYGON ((87.35997 49.21498, 86.59878 48.54918... Kazakhstan 804.1 1688.6 865.0 1164.9 1172.7 1278.1 1347.7 ... 16792090.0 17035551.0 17288285.0 17542806.0 17794055.0 18037776.0 18276452.0 18513673.0 18755666.0 19002586.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
164 MKD POLYGON ((22.38053 42.32026, 22.88137 41.99930... North Macedonia 3329.6 3502.1 2839.3 3381.3 3900.0 3051.1 3859.2 ... 2061044.0 2064032.0 2067471.0 2070226.0 2072490.0 2074502.0 2076217.0 2076694.0 2072531.0 2065092.0
165 SRB POLYGON ((18.82982 45.90887, 18.82984 45.90888... Serbia 4958.8 4751.4 3701.7 5157.8 5960.6 4787.3 6165.9 ... 7199077.0 7164132.0 7130576.0 7095383.0 7058322.0 7020858.0 6982604.0 6945235.0 6899126.0 6844078.0
166 MNE POLYGON ((20.07070 42.58863, 19.80161 42.50009... Montenegro 3321.4 3305.7 2638.8 3770.5 3451.5 3146.5 3261.7 ... 620601.0 621207.0 621810.0 622159.0 622303.0 622373.0 622227.0 622028.0 621306.0 620173.0
167 TTO POLYGON ((-61.68000 10.76000, -61.10500 10.890... Trinidad and Tobago 1667.4 1639.5 1471.7 1610.5 1329.4 1110.4 1444.3 ... 1344814.0 1353708.0 1362337.0 1370332.0 1377563.0 1384060.0 1389841.0 1394969.0 1399491.0 1403374.0
168 SSD POLYGON ((30.83385 3.50917, 29.95350 4.17370, ... South Sudan NaN NaN 705.2 765.6 1253.7 907.3 879.1 ... 10113648.0 10355030.0 10554882.0 10715657.0 10832520.0 10910774.0 10975924.0 11062114.0 11193729.0 11381377.0

169 rows × 28 columns

Task 1.2.1¶

For the year 2021, generate choropleth maps of cereal yield for only the countries having a population less than or equal to 67326569. Very briefly interpret the generated map

I have my dataframe well figured out, I shall now begin to respond to the questions. I shall filter according to question

In [43]:
earth_yield2.columns
Out[43]:
Index(['Country Code', 'geometry', 'Country Name_yield', '2010_yield',
       '2011_yield', '2012_yield', '2013_yield', '2014_yield', '2015_yield',
       '2016_yield', '2017_yield', '2018_yield', '2019_yield', '2020_yield',
       '2021_yield', 'Country Name_pop', '2010_pop', '2011_pop', '2012_pop',
       '2013_pop', '2014_pop', '2015_pop', '2016_pop', '2017_pop', '2018_pop',
       '2019_pop', '2020_pop', '2021_pop'],
      dtype='object')
In [44]:
# I will drop all null values at this point
earth_yield2.dropna(inplace= True)
In [45]:
#filter for 2021
earth_yield_2021 = earth_yield2[['Country Code', 'geometry', 'Country Name_yield', '2021_yield', '2021_pop']]
In [46]:
earth_yield_2021 # This dataframe has all countries their population and yield for 2021. I shall filter according to task from here
Out[46]:
Country Code geometry Country Name_yield 2021_yield 2021_pop
0 FJI MULTIPOLYGON (((180.00000 -16.06713, 180.00000... Fiji 3887.3 902899.0
1 TZA POLYGON ((33.90371 -0.95000, 34.07262 -1.05982... Tanzania 1651.1 61498438.0
2 CAN MULTIPOLYGON (((-122.84000 49.00000, -122.9742... Canada 3078.3 38246108.0
3 USA MULTIPOLYGON (((-122.84000 49.00000, -120.0000... United States 8268.0 331893745.0
4 KAZ POLYGON ((87.35997 49.21498, 86.59878 48.54918... Kazakhstan 1049.0 19002586.0
... ... ... ... ... ...
163 BIH POLYGON ((18.56000 42.65000, 17.67492 43.02856... Bosnia and Herzegovina 4422.1 3263459.0
164 MKD POLYGON ((22.38053 42.32026, 22.88137 41.99930... North Macedonia 3537.8 2065092.0
165 SRB POLYGON ((18.82982 45.90887, 18.82984 45.90888... Serbia 5768.6 6844078.0
166 MNE POLYGON ((20.07070 42.58863, 19.80161 42.50009... Montenegro 3330.7 620173.0
167 TTO POLYGON ((-61.68000 10.76000, -61.10500 10.890... Trinidad and Tobago 1588.1 1403374.0

164 rows × 5 columns

In [47]:
#filter for population less than or equal to 67326569
pop_67326569 =earth_yield_2021[earth_yield_2021['2021_pop']<=67326569]

I am going to sort by 2021 yield and 2021 population, this is to show top and bottom countries and to assist me in explaining.

In [48]:
pop_67326569.sort_values(by='2021_yield', ascending= False) #filter by 2021 cereal yield
Out[48]:
Country Code geometry Country Name_yield 2021_yield 2021_pop
81 ARE POLYGON ((51.57952 24.24550, 51.75744 24.29407... United Arab Emirates 26226.2 9991083.0
85 OMN MULTIPOLYGON (((55.20834 22.70833, 55.23449 23... Oman 16461.4 5223376.0
83 KWT POLYGON ((47.97452 29.97582, 48.18319 29.53448... Kuwait 11216.7 4328553.0
133 NZL MULTIPOLYGON (((176.88582 -40.06598, 176.50802... New Zealand 8728.4 5122600.0
130 IRL POLYGON ((-6.19788 53.86757, -6.03299 53.15316... Ireland 8606.5 5028230.0
... ... ... ... ... ...
13 SDN POLYGON ((24.56737 8.22919, 23.80581 8.66632, ... Sudan 566.8 44909351.0
47 NAM POLYGON ((19.89577 -24.76779, 19.89473 -28.461... Namibia 517.3 2587344.0
11 SOM POLYGON ((41.58513 -1.68325, 40.99300 -0.85829... Somalia 502.6 16359500.0
77 GMB POLYGON ((-16.71373 13.59496, -15.62460 13.623... Gambia, The 490.7 2486937.0
52 NER POLYGON ((14.85130 22.86295, 15.09689 21.30852... Niger 349.6 25130810.0

143 rows × 5 columns

In [49]:
pop_67326569.sort_values(by='2021_pop', ascending= False) #sort descending by 2021 population
Out[49]:
Country Code geometry Country Name_yield 2021_yield 2021_pop
139 GBR MULTIPOLYGON (((-6.19788 53.86757, -6.95373 54... United Kingdom 6966.8 67326569.0
1 TZA POLYGON ((33.90371 -0.95000, 34.07262 -1.05982... Tanzania 1651.1 61498438.0
22 ZAF POLYGON ((16.34498 -28.57671, 16.82402 -28.082... South Africa 5124.7 60041996.0
137 ITA MULTIPOLYGON (((10.44270 46.89355, 11.04856 46... Italy 5562.8 59066225.0
12 KEN POLYGON ((39.20222 -4.67677, 37.76690 -3.67712... Kenya 1487.6 54985702.0
... ... ... ... ... ...
145 BRN POLYGON ((115.45071 5.44773, 115.40570 4.95523... Brunei Darussalam 2885.2 441532.0
36 BLZ POLYGON ((-89.14308 17.80832, -89.15091 17.955... Belize 4289.5 404915.0
18 BHS MULTIPOLYGON (((-78.98000 26.79000, -78.51000 ... Bahamas, The 8419.2 396914.0
86 VUT MULTIPOLYGON (((167.21680 -15.89185, 167.84488... Vanuatu 609.5 314464.0
131 NCL POLYGON ((165.77999 -21.08000, 166.59999 -21.7... New Caledonia 7810.9 272620.0

143 rows × 5 columns

In [50]:
fig, ax = plt.subplots(figsize=(20,16))
earth.plot(ax=ax, color= "black")
pop_67326569.plot(ax=ax, column="2021_yield", legend=True, legend_kwds={"label": "cereal yield for only the countries having a population less than or equal to 67326569","orientation":"horizontal"}, cmap='Set1')
plt.show()
No description has been provided for this image
In [51]:
pop_67326569.explore(column='2021_yield',   # make choropleth based on "" column
    tooltip=["Country Name_yield",'2021_yield','2021_pop'],  # show "Country Name_yield",'2021_yield','2021_pop' values in tooltip (on hover)
    popup=True,  # show all values in popup (on click)
    tiles="openstreetmap",  # I shall use "openstreetmap" tiles
    cmap="Set1",  # use "Set1" matplotlib colormap
    legend=True,                
    style_kwds=dict(color="black"), # use black outline
    zoom_control=True,
    legend_kwds={"label": "cereal yield for only the countries having a population less than or equal to 67326569","orientation":"horizontal"}
                    )
    
                    
Out[51]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Brief Explanation¶

  1. The United Arab Emirates at 26226.2kg per hectare, Oman at 16461.4kg per hectare, and Kuwait, all Asian countries, come first, second, and third for countries with the highest cereal yield for the year 2021.
  2. The top 3 countries are in Asia.
  3. The Australian continent is represented among this cohort by New Zealand at 8728.4kg per hectare.
  4. Europe is represented in this cohort by the Republic of Ireland, at 8606.5kg per hectare.
  5. Africa is at the bottom.

Policy implications and lessons learned.¶

The UAE, Oman, and Kuwait are striving to achieve self-sufficiency in food production on a sustainable scale. Great Britain, despite having the highest population, is not among the top 5. Great Britain should begin to study the models of the top countries in cereal production and begin to implement them immediately.

Given the rate of political unrest all over the world with allies being formed and the uncertainty of the global market, reliance on the importation of cereal and holding to the economic principles of specialization and international trade-offs may spell doom when push comes to shove.

Self-sufficiency in food production for Great Britain and all countries is a clarion call.

Task 1.2.2¶

For the year 2021, generate choropleth maps of cereal yield for only the countries having a population greater than or equal to 331,893,745. Very briefly interpret the generated map.

In [52]:
#filter for population greater than or equal to greater than or equal to 331893745
pop_331893745 = earth_yield_2021[earth_yield_2021['2021_pop']>=331893745]
In [53]:
pop_331893745
Out[53]:
Country Code geometry Country Name_yield 2021_yield 2021_pop
3 USA MULTIPOLYGON (((-122.84000 49.00000, -120.0000... United States 8268.0 3.318937e+08
95 IND POLYGON ((97.32711 28.26158, 97.40256 27.88254... India 3478.8 1.393409e+09
136 CHN MULTIPOLYGON (((109.47521 18.19770, 108.65521 ... China 6320.8 1.412360e+09
In [54]:
fig, ax = plt.subplots(figsize=(20,16))
earth.plot(ax=ax, color= "black")
pop_331893745.plot(ax=ax, column="2021_yield", legend=True, legend_kwds={"label": "Creal yield for the countries having a population greater than or equal to 331,893,745","orientation":"horizontal"}, cmap='Set1')
plt.show()
No description has been provided for this image
In [55]:
pop_331893745.explore(column='2021_yield',
    tooltip=["Country Name_yield",'2021_yield','2021_pop'],  # show Country Name_yield",'2021_yield','2021_pop values in tooltip (on hover)
    popup=True,  # show all values in popup (on click)
    tiles="openstreetmap",  # use "openstreetmap" tiles
    cmap="Set1",  # use "Set1" matplotlib colormap
    legend=True,                
    style_kwds=dict(color="black"), # use black outline
    zoom_control=True,
    #zoom_start = 1,
   #scrollWheelZoom=False
                     )
Out[55]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Brief Explanation¶

The map shows that:

The United States, in gray, has a population of about 331 million people and a cereal yield of 8268. This cereal is just fair given its population.

China, in yellow, has a cereal yield of 6320.8kg per hectare and a population of 1.412 billion people as of 2021.

I have a bit of concern for India. I do not think that a cereal yield of 3478.8kg is right for a population of 1.393 billion. All hands have to be on deck to increase cereal yield to avoid Robert Thomas Malthus's 'doom' theory of a geometric population growth rate and arithmetic growth in food supplies (Intelligent Economist, 2020).

I am aware that there could be controversy over population control measures. But we can never go wrong with increasing cereal yield.

Task 1.2.3¶

For the year 2021, generate choropleth maps of cereal yield for only the countries having a population between 10269022 and 1393409034. Very briefly interpret the generated map

In [56]:
pop_10269022 = earth_yield_2021[(earth_yield_2021["2021_pop"]>= 10269022) & (earth_yield_2021["2021_pop"] <=1393409034)]
In [57]:
pop_10269022.sort_values(by='2021_yield', ascending= False) #sort descending by 2021 cereal yield
Out[57]:
Country Code geometry Country Name_yield 2021_yield 2021_pop
3 USA MULTIPOLYGON (((-122.84000 49.00000, -120.0000... United States 8268.0 331893745.0
126 BEL POLYGON ((6.15666 50.80372, 6.04307 50.12805, ... Belgium 7906.2 11587882.0
127 NLD POLYGON ((6.90514 53.48216, 7.09205 53.14404, ... Netherlands 7872.3 17533405.0
40 FRA MULTIPOLYGON (((-51.65780 4.15623, -52.24934 3... France 7170.9 67499343.0
157 EGY POLYGON ((36.86623 22.00000, 32.90000 22.00000... Egypt, Arab Rep. 7132.5 104258327.0
... ... ... ... ... ...
14 TCD POLYGON ((23.83766 19.58047, 23.88689 15.61084... Chad 814.3 16914985.0
153 YEM POLYGON ((52.00001 19.00000, 52.78218 17.34974... Yemen, Rep. 791.8 30490639.0
13 SDN POLYGON ((24.56737 8.22919, 23.80581 8.66632, ... Sudan 566.8 44909351.0
11 SOM POLYGON ((41.58513 -1.68325, 40.99300 -0.85829... Somalia 502.6 16359500.0
52 NER POLYGON ((14.85130 22.86295, 15.09689 21.30852... Niger 349.6 25130810.0

87 rows × 5 columns

I will also sort by 2021 population to aid my discussion

In [58]:
pop_10269022.sort_values(by='2021_pop', ascending= False) #sort descending by 2021 population
Out[58]:
Country Code geometry Country Name_yield 2021_yield 2021_pop
95 IND POLYGON ((97.32711 28.26158, 97.40256 27.88254... India 3478.8 1.393409e+09
3 USA MULTIPOLYGON (((-122.84000 49.00000, -120.0000... United States 8268.0 3.318937e+08
7 IDN MULTIPOLYGON (((141.00021 -2.60015, 141.01706 ... Indonesia 5351.3 2.763618e+08
99 PAK POLYGON ((77.83745 35.49401, 76.87172 34.65354... Pakistan 3564.9 2.251999e+08
26 BRA POLYGON ((-53.37366 -33.76838, -53.65054 -33.2... Brazil 4478.7 2.139934e+08
... ... ... ... ... ...
149 CZE POLYGON ((15.01700 51.10667, 15.49097 50.78473... Czechia 6113.0 1.070345e+07
120 GRC MULTIPOLYGON (((26.29000 35.29999, 26.16500 35... Greece 4272.7 1.066457e+07
107 SWE POLYGON ((11.02737 58.85615, 11.46827 59.43239... Sweden 5064.7 1.041581e+07
128 PRT POLYGON ((-9.03482 41.88057, -8.67195 42.13469... Portugal 5379.7 1.029942e+07
80 JOR POLYGON ((35.54567 32.39399, 35.71992 32.70919... Jordan 2290.1 1.026902e+07

87 rows × 5 columns

In [59]:
fig, ax = plt.subplots(figsize=(20,16))
earth.plot(ax=ax, color= "black")
pop_10269022.plot(ax=ax, column='2021_yield', legend=True, legend_kwds={"label": 'cereal yield for only the countries having a population between 10269022 and 1393409034', "orientation":"horizontal"}, cmap='Set1')
plt.show()
No description has been provided for this image
In [60]:
pop_10269022.explore(column='2021_yield',   # make choropleth based on "2021_yield" column
    tooltip=["Country Name_yield",'2021_yield','2021_pop'],  # show "Country Name_yield",'2021_yield','2021_pop' in tooltip (on hover)
    popup=True,  # show all values in popup (on click)
    tiles="openstreetmap",  # use "openstreetmap" tiles
    cmap="Set1",  # use "Set1" matplotlib colormap
    legend=True,                
    style_kwds=dict(color="black"), # use black outline
    zoom_control=True,
    #zoom_start = 1,
   #scrollWheelZoom=False
                    )
Out[60]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Brief Explanation:¶

From data frames and maps, I can observe that:

  1. The United States of America (Gray) comes in first in this category with a cereal yield of 8268kg per hectare and a population of about 331 million.
  2. The European countries of Belgium at 7906 cereal yield, the Netherlands at 7170kg per hectare, and France at 7170.9kg per hectare occupy the 2nd, 3rd, and 4th top positions, respectively.
  3. Africa is represented in the 5th top position by Egypt, at 7132kg per hectare.
  4. Africa is well represented at the bottom of the list of countries with the lowest cereal yield. The countries at the bottom are experiencing or have recently experienced civil wars and internal crises.
  5. I noticed that India has the highest population in this category, but its crop yield is not close to the second country with the lowest population (Portugal). This is a policy area of concern for India to avoid the 'doom' of the Malthus theory of population growth.

Task 1.2.4¶

Plot (scatter or line plot) the percentage change in cereal yield from 2011 to 2021, for the country having the highest population in 2021. In this question, you must consider the cereal yield for each year between 2011 and 2021. Very briefly interpret the generated plot

In [61]:
earth_yield2.columns # inspect the parent dataframe for this question
Out[61]:
Index(['Country Code', 'geometry', 'Country Name_yield', '2010_yield',
       '2011_yield', '2012_yield', '2013_yield', '2014_yield', '2015_yield',
       '2016_yield', '2017_yield', '2018_yield', '2019_yield', '2020_yield',
       '2021_yield', 'Country Name_pop', '2010_pop', '2011_pop', '2012_pop',
       '2013_pop', '2014_pop', '2015_pop', '2016_pop', '2017_pop', '2018_pop',
       '2019_pop', '2020_pop', '2021_pop'],
      dtype='object')
In [62]:
#filter for useful columns and cereal yeild from 2011 to 2021
earth_yield_2010_2021 = earth_yield2[['Country Code', 'geometry', 'Country Name_yield', '2021_pop', '2010_yield', '2011_yield',
       '2012_yield', '2013_yield', '2014_yield', '2015_yield', '2016_yield',
       '2017_yield', '2018_yield', '2019_yield', '2020_yield', '2021_yield']]
In [63]:
earth_yield_2010_2021
Out[63]:
Country Code geometry Country Name_yield 2021_pop 2010_yield 2011_yield 2012_yield 2013_yield 2014_yield 2015_yield 2016_yield 2017_yield 2018_yield 2019_yield 2020_yield 2021_yield
0 FJI MULTIPOLYGON (((180.00000 -16.06713, 180.00000... Fiji 902899.0 2871.1 2821.3 2423.4 2999.5 4475.4 3001.0 3000.9 3000.9 3002.2 3353.4 3665.7 3887.3
1 TZA POLYGON ((33.90371 -0.95000, 34.07262 -1.05982... Tanzania 61498438.0 1647.9 1390.4 1314.8 1418.0 1524.8 1449.4 1570.4 1692.0 1938.5 1883.3 1698.4 1651.1
2 CAN MULTIPOLYGON (((-122.84000 49.00000, -122.9742... Canada 38246108.0 3501.1 3552.4 3456.3 4160.4 3647.0 3673.0 4239.1 4104.8 3914.7 4010.8 4095.4 3078.3
3 USA MULTIPOLYGON (((-122.84000 49.00000, -120.0000... United States 331893745.0 6978.1 6803.5 5911.9 7300.9 7638.1 7430.1 8614.2 8281.3 8196.4 8006.1 8145.3 8268.0
4 KAZ POLYGON ((87.35997 49.21498, 86.59878 48.54918... Kazakhstan 19002586.0 804.1 1688.6 865.0 1164.9 1172.7 1278.1 1347.7 1355.0 1359.8 1154.5 1288.6 1049.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
163 BIH POLYGON ((18.56000 42.65000, 17.67492 43.02856... Bosnia and Herzegovina 3263459.0 3853.0 3724.9 3000.6 4026.9 3977.4 3812.1 5191.7 3732.1 5487.9 5354.7 6050.7 4422.1
164 MKD POLYGON ((22.38053 42.32026, 22.88137 41.99930... North Macedonia 2065092.0 3329.6 3502.1 2839.3 3381.3 3900.0 3051.1 3859.2 2807.5 3714.6 3554.0 3664.2 3537.8
165 SRB POLYGON ((18.82982 45.90887, 18.82984 45.90888... Serbia 6844078.0 4958.8 4751.4 3701.7 5157.8 5960.6 4787.3 6165.9 3967.9 6130.6 6126.9 6559.4 5768.6
166 MNE POLYGON ((20.07070 42.58863, 19.80161 42.50009... Montenegro 620173.0 3321.4 3305.7 2638.8 3770.5 3451.5 3146.5 3261.7 3288.0 3311.8 3166.8 3260.0 3330.7
167 TTO POLYGON ((-61.68000 10.76000, -61.10500 10.890... Trinidad and Tobago 1403374.0 1667.4 1639.5 1471.7 1610.5 1329.4 1110.4 1444.3 1627.5 1691.7 1529.7 1520.3 1588.1

164 rows × 16 columns

In [64]:
# I will use the .idxmax() method to get the highest population in 2021
high_pop_2021 = earth_yield_2010_2021.loc[earth_yield_2010_2021['2021_pop'].idxmax()]
In [65]:
high_pop_2021 # Highest population is China,  1412360000.0.
Out[65]:
Country Code                                                        CHN
geometry              MULTIPOLYGON (((109.47520958866365 18.19770091...
Country Name_yield                                                China
2021_pop                                                   1412360000.0
2010_yield                                                       5526.1
2011_yield                                                       5709.4
2012_yield                                                       5827.1
2013_yield                                                       5894.1
2014_yield                                                       5893.2
2015_yield                                                       5985.7
2016_yield                                                       6017.6
2017_yield                                                       6111.3
2018_yield                                                       6125.4
2019_yield                                                       6265.9
2020_yield                                                       6314.2
2021_yield                                                       6320.8
Name: 136, dtype: object
In [66]:
high_pop_2021= high_pop_2021.iloc[4:] #I filter out the cereal yield from 2011 to 2021
In [67]:
high_pop_2021
Out[67]:
2010_yield    5526.1
2011_yield    5709.4
2012_yield    5827.1
2013_yield    5894.1
2014_yield    5893.2
2015_yield    5985.7
2016_yield    6017.6
2017_yield    6111.3
2018_yield    6125.4
2019_yield    6265.9
2020_yield    6314.2
2021_yield    6320.8
Name: 136, dtype: object
In [68]:
high_pop_2021.pct_change()
Out[68]:
2010_yield         NaN
2011_yield    0.033170
2012_yield    0.020615
2013_yield    0.011498
2014_yield   -0.000153
2015_yield    0.015696
2016_yield    0.005329
2017_yield    0.015571
2018_yield    0.002307
2019_yield    0.022937
2020_yield    0.007708
2021_yield    0.001045
Name: 136, dtype: float64
In [69]:
#Plotting percentage change in cereal yield from 2011 to 2021
high_pop_2021.pct_change().plot()
Out[69]:
<Axes: >
No description has been provided for this image

Brief Explanation¶

  1. The percentage change in cereal yield in China started at 0.033170% in 2011.
  2. It declined steadily between 2011 and 2013, and in 2014 it went negative at -0.000153%.
  3. There was quite a sharp rise in 2015 and a drop again in 2016.
  4. It rose by almost the same proportion in 2017 as it declined in 2016. That is, the gradient of fall in 2016 is almost the same as the gradient of rise in 2017.
  5. There is a sharp rise between 2018 and 2019.
  6. From 2019 to 2021, the cereal yield went into free decline.
  7. It eventually closed at 0.001045% after a series of falls and rises within the periods under consideration.

Task 1.2.5¶

Present a scatter plot between the mean population of each country and the mean cereal yield from the year 2011 until 2021. Very briefly interpret the generated plot, particularly looking for any correlation (if present) among the plotted variables. In this question, you must consider each year between 2011 and 2021 to find the mean population and mean cereal yield.

In [70]:
earth_yield2.columns
Out[70]:
Index(['Country Code', 'geometry', 'Country Name_yield', '2010_yield',
       '2011_yield', '2012_yield', '2013_yield', '2014_yield', '2015_yield',
       '2016_yield', '2017_yield', '2018_yield', '2019_yield', '2020_yield',
       '2021_yield', 'Country Name_pop', '2010_pop', '2011_pop', '2012_pop',
       '2013_pop', '2014_pop', '2015_pop', '2016_pop', '2017_pop', '2018_pop',
       '2019_pop', '2020_pop', '2021_pop'],
      dtype='object')

I will drop the 2010 columns. They were used to explain percentage change in crop yield only.

In [71]:
earth_yield2.drop(['2010_yield', '2010_pop'], axis=1, inplace=True)
In [72]:
earth_yield2.columns
Out[72]:
Index(['Country Code', 'geometry', 'Country Name_yield', '2011_yield',
       '2012_yield', '2013_yield', '2014_yield', '2015_yield', '2016_yield',
       '2017_yield', '2018_yield', '2019_yield', '2020_yield', '2021_yield',
       'Country Name_pop', '2011_pop', '2012_pop', '2013_pop', '2014_pop',
       '2015_pop', '2016_pop', '2017_pop', '2018_pop', '2019_pop', '2020_pop',
       '2021_pop'],
      dtype='object')
In [73]:
#Get the mean of the cereal yield and add mean yield column to the earth_yield data frame
In [74]:
earth_yield2['Mean_yield']=earth_yield2.iloc[:,3:13].mean(axis=1)
In [75]:
earth_yield2[:5]
Out[75]:
Country Code geometry Country Name_yield 2011_yield 2012_yield 2013_yield 2014_yield 2015_yield 2016_yield 2017_yield ... 2013_pop 2014_pop 2015_pop 2016_pop 2017_pop 2018_pop 2019_pop 2020_pop 2021_pop Mean_yield
0 FJI MULTIPOLYGON (((180.00000 -16.06713, 180.00000... Fiji 2821.3 2423.4 2999.5 4475.4 3001.0 3000.9 3000.9 ... 865602.0 866447.0 868632.0 872406.0 877460.0 883490.0 889955.0 896444.0 902899.0 3174.37
1 TZA POLYGON ((33.90371 -0.95000, 34.07262 -1.05982... Tanzania 1390.4 1314.8 1418.0 1524.8 1449.4 1570.4 1692.0 ... 48483132.0 49960563.0 51482638.0 53049231.0 54660345.0 56313444.0 58005461.0 59734213.0 61498438.0 1588.00
2 CAN MULTIPOLYGON (((-122.84000 49.00000, -122.9742... Canada 3552.4 3456.3 4160.4 3647.0 3673.0 4239.1 4104.8 ... 35082954.0 35437435.0 35702908.0 36109487.0 36545236.0 37065084.0 37601230.0 38037204.0 38246108.0 3885.39
3 USA MULTIPOLYGON (((-122.84000 49.00000, -120.0000... United States 6803.5 5911.9 7300.9 7638.1 7430.1 8614.2 8281.3 ... 316059947.0 318386329.0 320738994.0 323071755.0 325122128.0 326838199.0 328329953.0 331501080.0 331893745.0 7632.78
4 KAZ POLYGON ((87.35997 49.21498, 86.59878 48.54918... Kazakhstan 1688.6 865.0 1164.9 1172.7 1278.1 1347.7 1355.0 ... 17035551.0 17288285.0 17542806.0 17794055.0 18037776.0 18276452.0 18513673.0 18755666.0 19002586.0 1267.49

5 rows × 27 columns

In [76]:
earth_yield2.columns
Out[76]:
Index(['Country Code', 'geometry', 'Country Name_yield', '2011_yield',
       '2012_yield', '2013_yield', '2014_yield', '2015_yield', '2016_yield',
       '2017_yield', '2018_yield', '2019_yield', '2020_yield', '2021_yield',
       'Country Name_pop', '2011_pop', '2012_pop', '2013_pop', '2014_pop',
       '2015_pop', '2016_pop', '2017_pop', '2018_pop', '2019_pop', '2020_pop',
       '2021_pop', 'Mean_yield'],
      dtype='object')
In [77]:
earth_yield2['Mean_pop']=earth_yield2.iloc[:,15:25].mean(axis=1)
In [78]:
earth_yield2[:2]
Out[78]:
Country Code geometry Country Name_yield 2011_yield 2012_yield 2013_yield 2014_yield 2015_yield 2016_yield 2017_yield ... 2014_pop 2015_pop 2016_pop 2017_pop 2018_pop 2019_pop 2020_pop 2021_pop Mean_yield Mean_pop
0 FJI MULTIPOLYGON (((180.00000 -16.06713, 180.00000... Fiji 2821.3 2423.4 2999.5 4475.4 3001.0 3000.9 3000.9 ... 866447.0 868632.0 872406.0 877460.0 883490.0 889955.0 896444.0 902899.0 3174.37 874895.2
1 TZA POLYGON ((33.90371 -0.95000, 34.07262 -1.05982... Tanzania 1390.4 1314.8 1418.0 1524.8 1449.4 1570.4 1692.0 ... 49960563.0 51482638.0 53049231.0 54660345.0 56313444.0 58005461.0 59734213.0 61498438.0 1588.00 52441558.0

2 rows × 28 columns

select useful columns

In [79]:
earth_mean = earth_yield2[['geometry','Country Name_yield','Mean_yield', 'Mean_pop']]
In [80]:
earth_mean.sort_values(by='Mean_yield', ascending= False) #sort descending by mean yield
Out[80]:
geometry Country Name_yield Mean_yield Mean_pop
81 POLYGON ((51.57952 24.24550, 51.75744 24.29407... United Arab Emirates 25734.23 9390343.5
83 POLYGON ((47.97452 29.97582, 48.18319 29.53448... Kuwait 12671.55 3819773.3
85 MULTIPOLYGON (((55.20834 22.70833, 55.23449 23... Oman 12099.03 4286476.7
126 POLYGON ((6.15666 50.80372, 6.04307 50.12805, ... Belgium 8760.31 11295471.1
127 POLYGON ((6.90514 53.48216, 7.09205 53.14404, ... Netherlands 8385.30 17023700.7
... ... ... ... ...
86 MULTIPOLYGON (((167.21680 -15.89185, 167.84488... Vanuatu 604.37 274734.8
13 POLYGON ((24.56737 8.22919, 23.80581 8.66632, ... Sudan 600.36 39462148.6
46 POLYGON ((29.43219 -22.09131, 28.01724 -22.827... Botswana 533.02 2160123.9
52 POLYGON ((14.85130 22.86295, 15.09689 21.30852... Niger 498.37 20500747.4
47 POLYGON ((19.89577 -24.76779, 19.89473 -28.461... Namibia 426.54 2341771.5

164 rows × 4 columns

In [81]:
earth_mean.sort_values(by='Mean_pop', ascending= False) #sort descending by mean population
Out[81]:
geometry Country Name_yield Mean_yield Mean_pop
136 MULTIPOLYGON (((109.47521 18.19770, 108.65521 ... China 6014.39 1.381980e+09
95 POLYGON ((97.32711 28.26158, 97.40256 27.88254... India 3084.68 1.316492e+09
3 MULTIPOLYGON (((-122.84000 49.00000, -120.0000... United States 7632.78 3.215510e+08
7 MULTIPOLYGON (((141.00021 -2.60015, 141.01706 ... Indonesia 5149.45 2.596911e+08
26 POLYGON ((-53.37366 -33.76838, -53.65054 -33.2... Brazil 4786.45 2.052148e+08
... ... ... ... ...
145 POLYGON ((115.45071 5.44773, 115.40570 4.95523... Brunei Darussalam 1657.45 4.165801e+05
18 MULTIPOLYGON (((-78.98000 26.79000, -78.51000 ... Bahamas, The 7870.76 3.763192e+05
36 POLYGON ((-89.14308 17.80832, -89.15091 17.955... Belize 3462.48 3.643453e+05
86 MULTIPOLYGON (((167.21680 -15.89185, 167.84488... Vanuatu 604.37 2.747348e+05
131 POLYGON ((165.77999 -21.08000, 166.59999 -21.7... New Caledonia 5758.90 2.667010e+05

164 rows × 4 columns

INVESTIGATING RELATIONSHIP¶

I will have a working hypothesis:

Ho: There is no significant relationship between the mean population and the mean yield.

Please note that I have taken mean yield as my dependent variable and placed it on the Y-axis and the independent variable, population, on the X-axis.

In [82]:
earth_mean.plot(kind='scatter', x='Mean_pop', y='Mean_yield')
Out[82]:
<Axes: xlabel='Mean_pop', ylabel='Mean_yield'>
No description has been provided for this image

I will plot this with plotly.express for better view and easier explanation

In [83]:
import plotly.express as px
In [84]:
fig = px.scatter(earth_mean,x='Mean_pop', y='Mean_yield', color= 'Country Name_yield')
fig.show()

Pionts are too clustered at the bottom-right, I will cut off the outliers of population (at 3 million) and mean yield (at 10,000)

In [85]:
cut_outliers= earth_mean.loc[(earth_mean.Mean_pop<300000000) & (earth_mean.Mean_yield<10000)]
In [86]:
fig = px.scatter(cut_outliers,x='Mean_pop', y='Mean_yield', color= 'Country Name_yield')
fig.show()

Points are easier to view now, I can accept my null hypothesis at this point. However, I shall attempt a spatial view of possible relationships

In [87]:
fig, ax = plt.subplots(figsize=(20,16))
earth.plot(ax=ax, color= "black")
earth_mean.plot(ax=ax, column='Mean_yield', legend=True, legend_kwds={"label":'Mean cereal yield of countries between  2011 and 2021', "orientation":"horizontal"}, cmap='Set1')
plt.show()
No description has been provided for this image
In [88]:
earth_mean.explore(column='Mean_yield',   # make choropleth based on "BoroName" column
    tooltip=['Country Name_yield','Mean_yield','Mean_pop',],  # show "BoroName" value in tooltip (on hover)
    popup=True,  # show all values in popup (on click)
    tiles="openstreetmap",  # use "CartoDB positron" tiles
    cmap="Set1",  # use "Set1" matplotlib colormap
    legend=True,                
    style_kwds=dict(color="black"), # use black outline
    zoom_control=True,
    zoom_start = 1,
                  )
Out[88]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Brief Explanation¶

  1. I did not find a significant relationship in values between the mean population and the mean cereal yield.
  2. There is, however, a spatial relationship. It is regionalized.
  3. Africa is the worst-performing region in cereal yield.
  4. No northern American country is at the bottom in cereal yield.
  5. Bolivia is the only South American country at the bottom in cereal production within the period under consideration.
  6. The United Arab Emirates (popularly called Dubai) has maintained a consistent lead in cereal yield over the years.
  7. Kuwait and Oman are the 2nd and 3rd best-performing countries in cereal yield, respectively.
  8. Belgium and the Netherlands are the 4th and 5th best-performing countries in the world, and they are the 1st and 2nd in Europe, respectively.
  9. New Caledonia, despite its low population, is a top-performing country in cereal yield per hectare.
  10. African countries need to increase their cereal yield production to prevent extreme hunger and increase their balance of trade..

TASK 2: Gospatial Sentiment Analysis Using Social Media Data¶

INTRODUCTION

In this part, I will apply geospatial sentiment analysis to Twitter data using the Python library, TextBlob. I am provided with a dataset consisting of tweets relevant to cryptocurrency

Task 2.1: Data Pre-processing

Instruction

Using a set of suitable Python libraries, randomly retrieve 500 tweets where user locations are available. You should also filter out the irrelevant characters, symbols, hashtags, URLs etc. from the tweets to avoid any possible masking of the actual sentiment associated with the tweets. From this point onward you should use the processed tweet data for all the subsequent analyses.

I will start by installing textblob

TextBlob is a Python library for processing textual data. It provides a simple API for diving into common natural language processing such as sentiment analysis. It is built on top of NLTK and Pattern (TextBlob: Simplified Text Processing — TextBlob 0.16.0 Documentation, 2023; Muhammad, 2023)

In [89]:
!pip install textblob
Requirement already satisfied: textblob in c:\users\user\anaconda3\lib\site-packages (0.17.1)
Requirement already satisfied: nltk>=3.1 in c:\users\user\anaconda3\lib\site-packages (from textblob) (3.8.1)
Requirement already satisfied: click in c:\users\user\anaconda3\lib\site-packages (from nltk>=3.1->textblob) (8.0.4)
Requirement already satisfied: joblib in c:\users\user\anaconda3\lib\site-packages (from nltk>=3.1->textblob) (1.2.0)
Requirement already satisfied: regex>=2021.8.3 in c:\users\user\anaconda3\lib\site-packages (from nltk>=3.1->textblob) (2022.7.9)
Requirement already satisfied: tqdm in c:\users\user\anaconda3\lib\site-packages (from nltk>=3.1->textblob) (4.65.0)
Requirement already satisfied: colorama in c:\users\user\anaconda3\lib\site-packages (from click->nltk>=3.1->textblob) (0.4.6)

Next will be to import my python libraries. Matplotlib is a python visualition tool. Pandas will help me create a dataframe and manage the dataframe

In [90]:
from textblob import TextBlob
import matplotlib.pyplot as plt
import pandas as pd

The bitcoin tweet file is huge with more than 4million rows. I beg your indulgence take a sample of 1 million rows. One million is a fair representation of the total sample

In [91]:
bitcoin =pd.read_csv(r"Bitcoin_tweets.csv", nrows=1000000) 
C:\Users\User\AppData\Local\Temp\ipykernel_8832\894283730.py:1: DtypeWarning:

Columns (5,7,8,12) have mixed types. Specify dtype option on import or set low_memory=False.

In [92]:
bitcoin
Out[92]:
user_name user_location user_description user_created user_followers user_friends user_favourites user_verified date text hashtags source is_retweet
0 DeSota Wilson Atlanta, GA Biz Consultant, real estate, fintech, startups... 39929.836910 8534 7605 4838.0 FALSE 44237.9993518519 Blue Ridge Bank shares halted by NYSE after #b... ['bitcoin'] Twitter Web App False
1 CryptoND NaN 😎 BITCOINLIVE is a Dutch platform aimed at inf... 43755.841782 6769 1532 25483.0 FALSE 44237.9991666667 😎 Today, that's this #Thursday, we will do a "... ['Thursday', 'Btc', 'wallet', 'security'] Twitter for Android False
2 Tdlmatias London, England IM Academy : The best #forex, #SelfEducation, ... 41953.451817 128 332 924.0 FALSE 44237.9963888889 Guys evening, I have read this article about B... NaN Twitter Web App False
3 Crypto is the future NaN I will post a lot of buying signals for BTC tr... 43736.700139 625 129 14.0 FALSE 44237.9962152778 $BTC A big chance in a billion! Price: \487264... ['Bitcoin', 'FX', 'BTC', 'crypto'] dlvr.it False
4 Alex Kirchmaier 🇦🇹🇸🇪 #FactsSuperspreader Europa Co-founder @RENJERJerky | Forbes 30Under30 | I... 42403.552720 1249 1472 10482.0 FALSE 44237.9959027778 This network is secured by 9 508 nodes as of t... ['BTC'] Twitter Web App False
... ... ... ... ... ... ... ... ... ... ... ... ... ...
999995 Bob Lji ∞/21 🟠 Interlaken, Schweiz DYOR + DCA + HODL #BITCOIN NoAlts+NoCryptos+No... 44355.652176 391 1558 1243.0 False 44423.537072 #wtfhappenedin1971? \n#Bitcoin will fix this! ['wtfhappenedin1971', 'Bitcoin'] Twitter for iPhone False
999996 MoonTrip Blockchain Crypto Enthusiast 44289.838113 384 491 4771.0 False 44423.537049 @CryptoNadine @BabyArabiaBSC #babyarabia massi... ['babyarabia', 'bitcoin', 'dev', 'sundayvibes'... Twitter for Android False
999997 I like Tea and #BSV NaN I like Tea and Buying #BSV 44315.271470 0 0 2.0 False 44423.536944 @mcuban Are you serious or part of the scam??\... ['BSV', 'BITCOIN', 'BSV', 'greentechnology'] Twitter for Android False
999998 The Blockchain Advisor™ Illinois, USA Bridging the Gap Between Traditional Investing... 40259.683646 970 854 9134.0 False 44423.536725 Why is #Bitcoin going to ascend to a global re... ['Bitcoin'] Twitter Web App False
999999 redforthebest NaN Creator of #CosmoHeads & #MintingClassics & #... 44313.291424 2357 4571 1448.0 False 44423.536528 #Bitcoin in #QRthings Collection\n(Only 0.03 e... ['Bitcoin', 'QRthings', 'Crypto', 'Ethereum', ... Twitter for Android False

1000000 rows × 13 columns

TextBlob is a Python library for processing textual data. It provides a simple API for diving into common natural language processing such as sentiment analysis. It is built on top of NLTK and Pattern (TextBlob: Simplified Text Processing — TextBlob 0.16.0 Documentation, 2023; Muhammad, 2023)

In [93]:
bitcoin= bitcoin.sample(n = 1500) # My first selection will be 1500 this is to compensate for missing values as null vlues will be drop
In [94]:
bitcoin
Out[94]:
user_name user_location user_description user_created user_followers user_friends user_favourites user_verified date text hashtags source is_retweet
904542 probtclife ⚡ NaN #Bitcoin 44203.483310 27 142 4992.0 False 44427.516875 @GaryGensler @FoxBusiness #Bitcoin fixes this ['Bitcoin'] Twitter Web App False
222839 J***** T****** Mainz Get free $DFI worth 30$ @cakedefi, use the fol... 44107.647488 44 194 3930.0 False 44368.595752 @CaptainCryptoHD @Cryptanzee Thanks for the qu... ['Bitcoin'] Twitter for Android False
75023 Maximus Monius NaN Crypto News, Technical Analysis, Tips & Tricks... 44246.658889 72 130 102.0 False 44310.680347 #BITCOIN UPDATE - April 24th\n\nWe seem to be ... ['BITCOIN', 'BTC', 'ALTSEASON'] Twitter Web App False
951672 Rahul Singh Sirohi Ghaziabad, India Advisor & Developer in #Crypto Industry Since ... 42906.352164 1406 1980 5134.0 False 44425.657338 When #Bitcoin Crashed &amp; goes to #Bearish M... ['Bitcoin', 'Bearish', 'BTC', 'Crypto', 'AskTo... Twitter for Android False
339824 JcL.Rheendy NaN NaN 44039.657245 40 1357 594.0 False 44379.606065 @WSB_WallStreet one of the very big giveaways ... ['giveaway', 'bitcoin', 'USDC', 'ETHEREUM', 'T... Twitter for Android False
... ... ... ... ... ... ... ... ... ... ... ... ... ...
912730 BitcoinAgile Matter Doesn't Matter Breaking News. Bitcoin, Blockchain & Beyond. #... 41646.994977 59940 13883 8737.0 False 44427.132847 TA: #bitcoin Turns Red, What Could Trigger Mor... ['bitcoin', 'btcusd', 'btcusdt', 'xbtusd'] bitcoinagile False
347606 VaxBLR Bengaluru, India Hourly updates on FREE and PAID 18+ and 45+ va... 44368.364282 12 0 0.0 False 44400.375208 45+ #URBAN #Bengaluru #CovidVaccine Availabili... ['URBAN', 'Bengaluru', 'CovidVaccine', 'COVISH... VaxBlr False
426243 𝕍ℝaj 🇮🇳 Mumbai Beating Inflation by Equities. Penchant for Pa... 40552.570046 117 917 4369.0 False 44399.662593 @ramrastogi @LinkedIn Sir I think inexorable c... ['Bitcoin'] Twitter for Android False
366435 thank u, next Greece NaN 40998.724236 699 1954 96384.0 False 44400.478669 The opening ceremony is so boring like people ... ['OlympicGames'] Twitter for iPhone False
386290 Mantas NaN Crypto enthusiast 🪙🚀🌕🐂 43099.398368 16 175 999.0 False 44401.31206 Bullish. #BTC #Bitcoin ['BTC', 'Bitcoin'] Twitter for iPhone False

1500 rows × 13 columns

In [95]:
bitcoin.isna().sum()
Out[95]:
user_name             0
user_location       710
user_description    163
user_created          0
user_followers        0
user_friends          0
user_favourites       0
user_verified         0
date                  0
text                  0
hashtags             31
source                3
is_retweet            0
dtype: int64
In [96]:
bitcoin.dropna(inplace= True)
In [97]:
bitcoin.isna().sum()
Out[97]:
user_name           0
user_location       0
user_description    0
user_created        0
user_followers      0
user_friends        0
user_favourites     0
user_verified       0
date                0
text                0
hashtags            0
source              0
is_retweet          0
dtype: int64
In [98]:
bitcoin.shape
Out[98]:
(737, 13)
In [99]:
bitcoin #inspect DF
Out[99]:
user_name user_location user_description user_created user_followers user_friends user_favourites user_verified date text hashtags source is_retweet
222839 J***** T****** Mainz Get free $DFI worth 30$ @cakedefi, use the fol... 44107.647488 44 194 3930.0 False 44368.595752 @CaptainCryptoHD @Cryptanzee Thanks for the qu... ['Bitcoin'] Twitter for Android False
951672 Rahul Singh Sirohi Ghaziabad, India Advisor & Developer in #Crypto Industry Since ... 42906.352164 1406 1980 5134.0 False 44425.657338 When #Bitcoin Crashed &amp; goes to #Bearish M... ['Bitcoin', 'Bearish', 'BTC', 'Crypto', 'AskTo... Twitter for Android False
788279 🔔system'cRe5520' AWS eu-west-1a Ireland Region The channel breakout trading strategy bot for ... 41062.324120 158 16 63.0 False 44417.625035 strategy: 5010HL1h atr20d: 2196.69\n\n09 Aug 2... ['BTC', 'BitMEX'] system'cRe5520' False
456359 Cryptorphic Moon #Bitcoin Certified Technical Analyst I Margin ... 43283.686458 4176 33 249.0 False 44398.774907 IMHO Do not long or Short anything now with le... ['Crypto', 'Btc', 'Bitcoin'] Twitter for iPhone False
741240 DoopieCash® The Hague, The Netherlands ▫ Professional Crypto & FX Trader ▫ Technical ... 43203.412604 21365 209 4714.0 False 44417.600694 Time to melt 🔥🔥🔥\n\n#Bitcoin on fire\n\n#BTC ... ['Bitcoin', 'BTC', 'crypto'] Twitter Web App False
... ... ... ... ... ... ... ... ... ... ... ... ... ...
356131 JJ Raleigh, NC nobody really cares what I put here. 40875.925961 306 462 8626.0 False 44393.935093 So many dumb replies. The #CovidVaccine does n... ['CovidVaccine', 'VaccineEducation'] Twitter Web App False
665361 Andre phill Ontario, Canada Masters at business Administration-MBA at Ohio... 44403.469086 8 49 12.0 FALSE 44405.1377083333 @APompliano Trust has an unlimited value, The ... ['BitcoinCash', 'money', 'dollar', 'btc'] Twitter for iPhone False
912730 BitcoinAgile Matter Doesn't Matter Breaking News. Bitcoin, Blockchain & Beyond. #... 41646.994977 59940 13883 8737.0 False 44427.132847 TA: #bitcoin Turns Red, What Could Trigger Mor... ['bitcoin', 'btcusd', 'btcusdt', 'xbtusd'] bitcoinagile False
347606 VaxBLR Bengaluru, India Hourly updates on FREE and PAID 18+ and 45+ va... 44368.364282 12 0 0.0 False 44400.375208 45+ #URBAN #Bengaluru #CovidVaccine Availabili... ['URBAN', 'Bengaluru', 'CovidVaccine', 'COVISH... VaxBlr False
426243 𝕍ℝaj 🇮🇳 Mumbai Beating Inflation by Equities. Penchant for Pa... 40552.570046 117 917 4369.0 False 44399.662593 @ramrastogi @LinkedIn Sir I think inexorable c... ['Bitcoin'] Twitter for Android False

737 rows × 13 columns

In [100]:
bitcoin['user_location'] #inspect location
Out[100]:
222839                            Mainz
951672                 Ghaziabad, India
788279    AWS eu-west-1a Ireland Region
456359                             Moon
741240       The Hague, The Netherlands
                      ...              
356131                      Raleigh, NC
665361                  Ontario, Canada
912730            Matter Doesn't Matter
347606                 Bengaluru, India
426243                          Mumbai 
Name: user_location, Length: 737, dtype: object

I will filter out columns that are useful to my study

In [101]:
bitcoin.columns
Out[101]:
Index(['user_name', 'user_location', 'user_description', 'user_created',
       'user_followers', 'user_friends', 'user_favourites', 'user_verified',
       'date', 'text', 'hashtags', 'source', 'is_retweet'],
      dtype='object')
In [102]:
bitcoin = bitcoin[['user_location', 'text', 'user_followers']]
In [103]:
bitcoin
Out[103]:
user_location text user_followers
222839 Mainz @CaptainCryptoHD @Cryptanzee Thanks for the qu... 44
951672 Ghaziabad, India When #Bitcoin Crashed &amp; goes to #Bearish M... 1406
788279 AWS eu-west-1a Ireland Region strategy: 5010HL1h atr20d: 2196.69\n\n09 Aug 2... 158
456359 Moon IMHO Do not long or Short anything now with le... 4176
741240 The Hague, The Netherlands Time to melt 🔥🔥🔥\n\n#Bitcoin on fire\n\n#BTC ... 21365
... ... ... ...
356131 Raleigh, NC So many dumb replies. The #CovidVaccine does n... 306
665361 Ontario, Canada @APompliano Trust has an unlimited value, The ... 8
912730 Matter Doesn't Matter TA: #bitcoin Turns Red, What Could Trigger Mor... 59940
347606 Bengaluru, India 45+ #URBAN #Bengaluru #CovidVaccine Availabili... 12
426243 Mumbai @ramrastogi @LinkedIn Sir I think inexorable c... 117

737 rows × 3 columns

Task 2.1: Data Pre-processing¶

Using a set of suitable Python libraries, randomly retrieve 500 tweets where user locations are available. You should also filter out the irrelevant characters, symbols, hashtags, URLs etc. from the tweets to avoid any possible masking of the actual sentiment associated with the tweets. From this point onward you should use the processed tweet data for all the subsequent analyses.

In [104]:
import re

I am going to define a function that identifies and removes irrelevant characters from the tweets. This is the essence of the 'import re' statement above. A regular expression (or RE) specifies a set of strings that match it; the functions in this module let you check if a particular string matches a given regular expression (Re: Regular Expression Operations, 2023).

In [105]:
def remove_rt(x): return re.sub('RT @\w+: ', " ", x)
 
def rt(x): return re.sub(
    "(@[A-Za-z0-9]+)|([^0-9A-Za-z \t])|(\w+:\/\/\S+)", " ", x)
 
bitcoin["text"] = bitcoin.text.map(remove_rt).map(rt)
bitcoin["text"] = bitcoin.text.str.lower()
C:\Users\User\AppData\Local\Temp\ipykernel_8832\2926551215.py:6: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

C:\Users\User\AppData\Local\Temp\ipykernel_8832\2926551215.py:7: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

In [106]:
bitcoin #inspect dataset
Out[106]:
user_location text user_followers
222839 Mainz thanks for the question my favourite one... 44
951672 Ghaziabad, India when bitcoin crashed amp goes to bearish m... 1406
788279 AWS eu-west-1a Ireland Region strategy 5010hl1h atr20d 2196 69 09 aug 202... 158
456359 Moon imho do not long or short anything now with le... 4176
741240 The Hague, The Netherlands time to melt bitcoin on fire btc cry... 21365
... ... ... ...
356131 Raleigh, NC so many dumb replies the covidvaccine does n... 306
665361 Ontario, Canada trust has an unlimited value the father of ... 8
912730 Matter Doesn't Matter ta bitcoin turns red what could trigger mor... 59940
347606 Bengaluru, India 45 urban bengaluru covidvaccine availabili... 12
426243 Mumbai sir i think inexorable certainty that cent... 117

737 rows × 3 columns

In [107]:
print(bitcoin['user_location'].value_counts()) #inspect before geocoding
user_location
Bay Area, CA                 25
United States                13
Global                       12
London, England              12
Australia                    11
                             ..
Croatia                       1
Bangladesh...... Khulna       1
Lewes, DE                     1
florida                       1
Mumbai                        1
Name: count, Length: 469, dtype: int64

Task 2.2: Geocoding¶

Geocode on all the 500 tweets retrieved and filtered in the previous step. To perform geocoding, you must be using a Python-based tool. Once the geocoding is performed then augment the tweet data set with two extra columns. One column should contain latitude and the other one should contain longitude information corresponding to a tweet

geopy is a Python client for several popular geocoding web services (Geopy, 2023),¶

geopy. (2023, November 23). PyPI. https://pypi.org/project/geopy/

In [108]:
pip install geopy 
Requirement already satisfied: geopy in c:\users\user\anaconda3\lib\site-packages (2.4.1)Note: you may need to restart the kernel to use updated packages.

Requirement already satisfied: geographiclib<3,>=1.52 in c:\users\user\anaconda3\lib\site-packages (from geopy) (2.0)
In [109]:
# Import Nominatim
from geopy.geocoders import Nominatim
In [110]:
geolocator = Nominatim(user_agent="Assessment")
In [111]:
from geopy.extra.rate_limiter import RateLimiter

I am going to duplicate the user_location column but call it locations. The essence is to have a column that keeps the original state of the user location ater applying geocoding to the fataframe

In [112]:
bitcoin['locations'] = bitcoin['user_location'] 
C:\Users\User\AppData\Local\Temp\ipykernel_8832\3072469440.py:1: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

In [113]:
bitcoin
Out[113]:
user_location text user_followers locations
222839 Mainz thanks for the question my favourite one... 44 Mainz
951672 Ghaziabad, India when bitcoin crashed amp goes to bearish m... 1406 Ghaziabad, India
788279 AWS eu-west-1a Ireland Region strategy 5010hl1h atr20d 2196 69 09 aug 202... 158 AWS eu-west-1a Ireland Region
456359 Moon imho do not long or short anything now with le... 4176 Moon
741240 The Hague, The Netherlands time to melt bitcoin on fire btc cry... 21365 The Hague, The Netherlands
... ... ... ... ...
356131 Raleigh, NC so many dumb replies the covidvaccine does n... 306 Raleigh, NC
665361 Ontario, Canada trust has an unlimited value the father of ... 8 Ontario, Canada
912730 Matter Doesn't Matter ta bitcoin turns red what could trigger mor... 59940 Matter Doesn't Matter
347606 Bengaluru, India 45 urban bengaluru covidvaccine availabili... 12 Bengaluru, India
426243 Mumbai sir i think inexorable certainty that cent... 117 Mumbai

737 rows × 4 columns

In [114]:
geocode = RateLimiter(geolocator.geocode, min_delay_seconds=1)
In [ ]:
bitcoin['user_location'] = bitcoin['locations'].apply(geocode)
bitcoin['latitude'] = bitcoin['user_location'].apply(lambda x: x.latitude if x else None)
bitcoin['longitude'] = bitcoin['user_location'].apply(lambda x: x.longitude if x else None)
In [ ]:
bitcoin
In [ ]:
print(bitcoin['locations'].value_counts())
In [ ]:
bitcoin.dropna(inplace = True)
In [ ]:
bitcoin.isna().sum()
In [ ]:
bitcoin

Next step will be to take out a sample 500 at this point as all tweets in my bitcoin2 dataframe have location

In [ ]:
bitcoin2=bitcoin.sample(n=500) 
In [ ]:
bitcoin2
In [ ]:
bitcoin2.to_csv('bitcoin_final.csv') #save dataframe as a csv file for future reference
In [117]:
bitcoin2 =gpd.read_file('bitcoin_final.csv')
In [118]:
bitcoin2
Out[118]:
field_1 user_location text user_followers locations latitude longitude geometry
0 714437 The Moon, Sylacauga, Talladega County, Alabama... talked to about stacks stx and hopes the... 39 The Moon 33.1584497 -86.2385846 None
1 280261 Lisboa, Portugal i m so f tired of ignorant american phd econom... 1614 Lisbon, Portugal 38.7077507 -9.1365919 None
2 336653 Muhu, Saare maakond, Eesti long bitcoin 43 Moon 58.5959044 23.21964608602439 None
3 872349 Global, 81, Barber Greene Road, Don Mills, Don... bitcoin bull cathie wood attracts big short... 77240 Global 43.7283874 -79.34914879325001 None
4 109405 Canada bitcoin price in us dollar btc usd btcusd ... 250 Canada 61.0666922 -107.991707 None
... ... ... ... ... ... ... ... ...
495 898076 Hyderabad, Bahadurpura mandal, Hyderabad Distr... freedom 35 official bsc bitcoin via 112 Hyderabad, India 17.360589 78.4740613 None
496 951645 b'424c4f434b434841494e2c20d09dd0b0d0b1d0b5d180... bitcoin btc i don t like these lower highs... 21254 #blockchain 44.6465984 34.4007341 None
497 7974 Sydney, Council of the City of Sydney, New Sou... what goes up must comes down hard question ... 26 Sydney, New South Wales -33.8698439 151.2082848 None
498 788995 Montréal, Agglomération de Montréal, MontrÃ... am i the only one that likes watching the sats... 135 Montréal, Québec 45.5031824 -73.5698065 None
499 62652 Laguna Beach, Orange County, California, Unite... lamb is a fast safe and scalable blockchai... 3784 Laguna Beach, CA 33.5426975 -117.785366 None

500 rows × 8 columns

In [119]:
bitcoin2.reset_index(drop=True, inplace=True) #reset index for a nice looking DF
In [120]:
bitcoin2 #inspect
Out[120]:
field_1 user_location text user_followers locations latitude longitude geometry
0 714437 The Moon, Sylacauga, Talladega County, Alabama... talked to about stacks stx and hopes the... 39 The Moon 33.1584497 -86.2385846 None
1 280261 Lisboa, Portugal i m so f tired of ignorant american phd econom... 1614 Lisbon, Portugal 38.7077507 -9.1365919 None
2 336653 Muhu, Saare maakond, Eesti long bitcoin 43 Moon 58.5959044 23.21964608602439 None
3 872349 Global, 81, Barber Greene Road, Don Mills, Don... bitcoin bull cathie wood attracts big short... 77240 Global 43.7283874 -79.34914879325001 None
4 109405 Canada bitcoin price in us dollar btc usd btcusd ... 250 Canada 61.0666922 -107.991707 None
... ... ... ... ... ... ... ... ...
495 898076 Hyderabad, Bahadurpura mandal, Hyderabad Distr... freedom 35 official bsc bitcoin via 112 Hyderabad, India 17.360589 78.4740613 None
496 951645 b'424c4f434b434841494e2c20d09dd0b0d0b1d0b5d180... bitcoin btc i don t like these lower highs... 21254 #blockchain 44.6465984 34.4007341 None
497 7974 Sydney, Council of the City of Sydney, New Sou... what goes up must comes down hard question ... 26 Sydney, New South Wales -33.8698439 151.2082848 None
498 788995 Montréal, Agglomération de Montréal, MontrÃ... am i the only one that likes watching the sats... 135 Montréal, Québec 45.5031824 -73.5698065 None
499 62652 Laguna Beach, Orange County, California, Unite... lamb is a fast safe and scalable blockchai... 3784 Laguna Beach, CA 33.5426975 -117.785366 None

500 rows × 8 columns

Next step will be to convert longitude and latitude to geometry and assign it to "EPSG:4326" coordinate reference system Code Source: GeoPandas Documentation (2023)

In [121]:
bitcoin_tweet = gpd.GeoDataFrame(bitcoin2, geometry=gpd.points_from_xy(bitcoin2.longitude, bitcoin2.latitude), crs="EPSG:4326"
)
In [122]:
bitcoin_tweet
Out[122]:
field_1 user_location text user_followers locations latitude longitude geometry
0 714437 The Moon, Sylacauga, Talladega County, Alabama... talked to about stacks stx and hopes the... 39 The Moon 33.1584497 -86.2385846 POINT (-86.23858 33.15845)
1 280261 Lisboa, Portugal i m so f tired of ignorant american phd econom... 1614 Lisbon, Portugal 38.7077507 -9.1365919 POINT (-9.13659 38.70775)
2 336653 Muhu, Saare maakond, Eesti long bitcoin 43 Moon 58.5959044 23.21964608602439 POINT (23.21965 58.59590)
3 872349 Global, 81, Barber Greene Road, Don Mills, Don... bitcoin bull cathie wood attracts big short... 77240 Global 43.7283874 -79.34914879325001 POINT (-79.34915 43.72839)
4 109405 Canada bitcoin price in us dollar btc usd btcusd ... 250 Canada 61.0666922 -107.991707 POINT (-107.99171 61.06669)
... ... ... ... ... ... ... ... ...
495 898076 Hyderabad, Bahadurpura mandal, Hyderabad Distr... freedom 35 official bsc bitcoin via 112 Hyderabad, India 17.360589 78.4740613 POINT (78.47406 17.36059)
496 951645 b'424c4f434b434841494e2c20d09dd0b0d0b1d0b5d180... bitcoin btc i don t like these lower highs... 21254 #blockchain 44.6465984 34.4007341 POINT (34.40073 44.64660)
497 7974 Sydney, Council of the City of Sydney, New Sou... what goes up must comes down hard question ... 26 Sydney, New South Wales -33.8698439 151.2082848 POINT (151.20828 -33.86984)
498 788995 Montréal, Agglomération de Montréal, MontrÃ... am i the only one that likes watching the sats... 135 Montréal, Québec 45.5031824 -73.5698065 POINT (-73.56981 45.50318)
499 62652 Laguna Beach, Orange County, California, Unite... lamb is a fast safe and scalable blockchai... 3784 Laguna Beach, CA 33.5426975 -117.785366 POINT (-117.78537 33.54270)

500 rows × 8 columns

Task 2.3 Polarity analysis¶

Calculate the polarity values of all the tweets. For a given geographical location, if you have more than one tweet then find the average polarity value taking into consideration all the tweets generated from the same location. Using a suitable plot type (such as a geographical map), perform a geospatial visualisation of the polarities corresponding to all the tweets. Whilst you are free to choose a plot type, the visualisation must be clear and easy to understand/interpret.

In [123]:
#define a funtion that gets text polarity
def getTextPolarity(txt):
    return TextBlob(txt).sentiment.polarity
In [124]:
bitcoin_tweet['polarity'] = bitcoin_tweet['text'].apply(getTextPolarity)
bitcoin_tweet.head()
Out[124]:
field_1 user_location text user_followers locations latitude longitude geometry polarity
0 714437 The Moon, Sylacauga, Talladega County, Alabama... talked to about stacks stx and hopes the... 39 The Moon 33.1584497 -86.2385846 POINT (-86.23858 33.15845) -0.125000
1 280261 Lisboa, Portugal i m so f tired of ignorant american phd econom... 1614 Lisbon, Portugal 38.7077507 -9.1365919 POINT (-9.13659 38.70775) -0.133333
2 336653 Muhu, Saare maakond, Eesti long bitcoin 43 Moon 58.5959044 23.21964608602439 POINT (23.21965 58.59590) -0.050000
3 872349 Global, 81, Barber Greene Road, Don Mills, Don... bitcoin bull cathie wood attracts big short... 77240 Global 43.7283874 -79.34914879325001 POINT (-79.34915 43.72839) 0.000000
4 109405 Canada bitcoin price in us dollar btc usd btcusd ... 250 Canada 61.0666922 -107.991707 POINT (-107.99171 61.06669) 0.165000
In [125]:
import seaborn as sns #This is library built ontop matplotlib, it aids visualisation
In [126]:
sns.histplot(x='polarity', data= bitcoin_tweet, color='green') #Plot polarity values count with sns
Out[126]:
<Axes: xlabel='polarity', ylabel='Count'>
No description has been provided for this image

Define a funtion that labels the polarity values, (Oluyale, 2023).

In [127]:
def  definepolarity(x):
  if x > 0.00:
    return "Positive" 
  elif x < 0.00: 
    return "Negative" 
  elif x == 0: 
    return "Neutral" 
In [ ]:
 

Apply the defined function and create another column for polarity label

In [128]:
bitcoin_tweet["polarity_label"] = bitcoin_tweet["polarity"].apply(definepolarity) 
In [129]:
bitcoin_tweet #inspect
Out[129]:
field_1 user_location text user_followers locations latitude longitude geometry polarity polarity_label
0 714437 The Moon, Sylacauga, Talladega County, Alabama... talked to about stacks stx and hopes the... 39 The Moon 33.1584497 -86.2385846 POINT (-86.23858 33.15845) -0.125000 Negative
1 280261 Lisboa, Portugal i m so f tired of ignorant american phd econom... 1614 Lisbon, Portugal 38.7077507 -9.1365919 POINT (-9.13659 38.70775) -0.133333 Negative
2 336653 Muhu, Saare maakond, Eesti long bitcoin 43 Moon 58.5959044 23.21964608602439 POINT (23.21965 58.59590) -0.050000 Negative
3 872349 Global, 81, Barber Greene Road, Don Mills, Don... bitcoin bull cathie wood attracts big short... 77240 Global 43.7283874 -79.34914879325001 POINT (-79.34915 43.72839) 0.000000 Neutral
4 109405 Canada bitcoin price in us dollar btc usd btcusd ... 250 Canada 61.0666922 -107.991707 POINT (-107.99171 61.06669) 0.165000 Positive
... ... ... ... ... ... ... ... ... ... ...
495 898076 Hyderabad, Bahadurpura mandal, Hyderabad Distr... freedom 35 official bsc bitcoin via 112 Hyderabad, India 17.360589 78.4740613 POINT (78.47406 17.36059) 0.000000 Neutral
496 951645 b'424c4f434b434841494e2c20d09dd0b0d0b1d0b5d180... bitcoin btc i don t like these lower highs... 21254 #blockchain 44.6465984 34.4007341 POINT (34.40073 44.64660) 0.000000 Neutral
497 7974 Sydney, Council of the City of Sydney, New Sou... what goes up must comes down hard question ... 26 Sydney, New South Wales -33.8698439 151.2082848 POINT (151.20828 -33.86984) -0.223611 Negative
498 788995 Montréal, Agglomération de Montréal, MontrÃ... am i the only one that likes watching the sats... 135 Montréal, Québec 45.5031824 -73.5698065 POINT (-73.56981 45.50318) 0.250000 Positive
499 62652 Laguna Beach, Orange County, California, Unite... lamb is a fast safe and scalable blockchai... 3784 Laguna Beach, CA 33.5426975 -117.785366 POINT (-117.78537 33.54270) 0.350000 Positive

500 rows × 10 columns

In [130]:
#Plot polarity label count with sns
In [131]:
plt.figure(figsize=(12,4))
sns.countplot(x='polarity_label', data=bitcoin_tweet)
plt.title('Polarity Label Count')
plt.show()
No description has been provided for this image
In [132]:
bitcoin_tweet.columns
Out[132]:
Index(['field_1', 'user_location', 'text', 'user_followers', 'locations',
       'latitude', 'longitude', 'geometry', 'polarity', 'polarity_label'],
      dtype='object')

Filter the useful columns for map visualization

In [133]:
bitcoin_tweet2= bitcoin_tweet[['geometry', 'polarity', 'polarity_label']]

Plotting Polarity¶

Polarity Values

In [134]:
fig, ax = plt.subplots(figsize=(20,16))
earth.plot(ax=ax, color= "gray")
bitcoin_tweet2.plot(ax=ax, column='polarity', legend=True, legend_kwds={"label": 'Polarity Values Spread of Bitcoin Tweets', "orientation":"horizontal"}, cmap='Set1')
plt.show()
No description has been provided for this image

Polarity Values on Interactive map with geopandas .explore()

In [135]:
bitcoin_tweet2.explore(column='polarity',   # make choropleth based on "polarity" column
    tooltip=['polarity'],  # show "polarity" value in tooltip (on hover)
    popup=True,  # show all values in popup (on click)
    tiles="openstreetmap",  # use "openstreetmap" tiles
    cmap="Set1",  # use "Set1" matplotlib colormap
    legend=True,  
    marker_kwds=dict(radius=5,icon=folium.Icon(icon='house-blank')),                    
    style_kwds=dict(color="black"), # use black outline
    zoom_control=True,
    #zoom_start = 1,
                    )
Out[135]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Polarity Labels

In [136]:
fig, ax = plt.subplots(figsize=(20,16))
earth.plot(ax=ax, color= "grey")
bitcoin_tweet2.plot(ax=ax, column='polarity_label', legend=True,  cmap='viridis')
plt.show()
No description has been provided for this image

Polarity Values on Interactive Map with GeoPandas .explore()

In [137]:
bitcoin_tweet2.explore(column='polarity_label',   # make choropleth based on "polarity_label" column
    tooltip=['polarity_label'],  # show "polarity_label" value in tooltip (on hover)
    popup=True,  # show all values in popup (on click)
    tiles="openstreetmap",  # use "openstreetmap" tiles
    cmap="viridis",  # use "Set1" matplotlib colormap
    legend=True,  
    marker_kwds=dict(radius=5,icon=folium.Icon(icon='house-blank')),                    
    style_kwds=dict(color="black"), # use black outline
    zoom_control=True,
    #zoom_start = 1,
                      
                      )
Out[137]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Task 2.4 Subjectivity Analysis¶

Calculate the subjectivity values of all the tweets. For a given geographical location, if you have more than one tweet then find the average subjectivity value taking into consideration all the tweets generated from the same location. Using a suitable plot type (such as a geographical map), perform a geospatial visualisation of the subjectivities corresponding to all the tweets. Whilst you are free to choose a plot type, the visualisation must be clear and easy to understand/interpret.

Define a function that extracts subjectivity values from the tweets(text)

In [138]:
def getTextSubjectivity(txt):
    return TextBlob(txt).sentiment.subjectivity

Apply the defined function and inspect

In [139]:
bitcoin_tweet['subjectivity'] = bitcoin_tweet['text'].apply(getTextSubjectivity)
bitcoin_tweet
Out[139]:
field_1 user_location text user_followers locations latitude longitude geometry polarity polarity_label subjectivity
0 714437 The Moon, Sylacauga, Talladega County, Alabama... talked to about stacks stx and hopes the... 39 The Moon 33.1584497 -86.2385846 POINT (-86.23858 33.15845) -0.125000 Negative 0.375000
1 280261 Lisboa, Portugal i m so f tired of ignorant american phd econom... 1614 Lisbon, Portugal 38.7077507 -9.1365919 POINT (-9.13659 38.70775) -0.133333 Negative 0.233333
2 336653 Muhu, Saare maakond, Eesti long bitcoin 43 Moon 58.5959044 23.21964608602439 POINT (23.21965 58.59590) -0.050000 Negative 0.400000
3 872349 Global, 81, Barber Greene Road, Don Mills, Don... bitcoin bull cathie wood attracts big short... 77240 Global 43.7283874 -79.34914879325001 POINT (-79.34915 43.72839) 0.000000 Neutral 0.200000
4 109405 Canada bitcoin price in us dollar btc usd btcusd ... 250 Canada 61.0666922 -107.991707 POINT (-107.99171 61.06669) 0.165000 Positive 0.351667
... ... ... ... ... ... ... ... ... ... ... ...
495 898076 Hyderabad, Bahadurpura mandal, Hyderabad Distr... freedom 35 official bsc bitcoin via 112 Hyderabad, India 17.360589 78.4740613 POINT (78.47406 17.36059) 0.000000 Neutral 0.000000
496 951645 b'424c4f434b434841494e2c20d09dd0b0d0b1d0b5d180... bitcoin btc i don t like these lower highs... 21254 #blockchain 44.6465984 34.4007341 POINT (34.40073 44.64660) 0.000000 Neutral 0.000000
497 7974 Sydney, Council of the City of Sydney, New Sou... what goes up must comes down hard question ... 26 Sydney, New South Wales -33.8698439 151.2082848 POINT (151.20828 -33.86984) -0.223611 Negative 0.415278
498 788995 Montréal, Agglomération de Montréal, MontrÃ... am i the only one that likes watching the sats... 135 Montréal, Québec 45.5031824 -73.5698065 POINT (-73.56981 45.50318) 0.250000 Positive 0.750000
499 62652 Laguna Beach, Orange County, California, Unite... lamb is a fast safe and scalable blockchai... 3784 Laguna Beach, CA 33.5426975 -117.785366 POINT (-117.78537 33.54270) 0.350000 Positive 0.550000

500 rows × 11 columns

Define a function a that labels the subjective values, (Oluyale, 2023).

In [140]:
def  definesubjectivity(x):
  if x > 0.5:
    return "subjective" 
  elif x < 0.5: 
    return "Factual" 
  elif x == 0.5: 
    return "Neutral" 
In [141]:
bitcoin_tweet #inspect
Out[141]:
field_1 user_location text user_followers locations latitude longitude geometry polarity polarity_label subjectivity
0 714437 The Moon, Sylacauga, Talladega County, Alabama... talked to about stacks stx and hopes the... 39 The Moon 33.1584497 -86.2385846 POINT (-86.23858 33.15845) -0.125000 Negative 0.375000
1 280261 Lisboa, Portugal i m so f tired of ignorant american phd econom... 1614 Lisbon, Portugal 38.7077507 -9.1365919 POINT (-9.13659 38.70775) -0.133333 Negative 0.233333
2 336653 Muhu, Saare maakond, Eesti long bitcoin 43 Moon 58.5959044 23.21964608602439 POINT (23.21965 58.59590) -0.050000 Negative 0.400000
3 872349 Global, 81, Barber Greene Road, Don Mills, Don... bitcoin bull cathie wood attracts big short... 77240 Global 43.7283874 -79.34914879325001 POINT (-79.34915 43.72839) 0.000000 Neutral 0.200000
4 109405 Canada bitcoin price in us dollar btc usd btcusd ... 250 Canada 61.0666922 -107.991707 POINT (-107.99171 61.06669) 0.165000 Positive 0.351667
... ... ... ... ... ... ... ... ... ... ... ...
495 898076 Hyderabad, Bahadurpura mandal, Hyderabad Distr... freedom 35 official bsc bitcoin via 112 Hyderabad, India 17.360589 78.4740613 POINT (78.47406 17.36059) 0.000000 Neutral 0.000000
496 951645 b'424c4f434b434841494e2c20d09dd0b0d0b1d0b5d180... bitcoin btc i don t like these lower highs... 21254 #blockchain 44.6465984 34.4007341 POINT (34.40073 44.64660) 0.000000 Neutral 0.000000
497 7974 Sydney, Council of the City of Sydney, New Sou... what goes up must comes down hard question ... 26 Sydney, New South Wales -33.8698439 151.2082848 POINT (151.20828 -33.86984) -0.223611 Negative 0.415278
498 788995 Montréal, Agglomération de Montréal, MontrÃ... am i the only one that likes watching the sats... 135 Montréal, Québec 45.5031824 -73.5698065 POINT (-73.56981 45.50318) 0.250000 Positive 0.750000
499 62652 Laguna Beach, Orange County, California, Unite... lamb is a fast safe and scalable blockchai... 3784 Laguna Beach, CA 33.5426975 -117.785366 POINT (-117.78537 33.54270) 0.350000 Positive 0.550000

500 rows × 11 columns

Apply the defined function

In [142]:
bitcoin_tweet["subjectivity_label"] = bitcoin_tweet["subjectivity"].apply(definesubjectivity)

Plot a histogram showing subjectivity values count

In [143]:
sns.histplot(x='subjectivity', data= bitcoin_tweet, color='orange')
Out[143]:
<Axes: xlabel='subjectivity', ylabel='Count'>
No description has been provided for this image

Do a count plot of subjectivity label

In [144]:
plt.figure(figsize=(12,4))
sns.countplot(x='subjectivity_label', data=bitcoin_tweet)
plt.title('Subjectivity Label Count')
plt.show()
No description has been provided for this image
In [145]:
bitcoin_tweet.columns #inspect available columns
Out[145]:
Index(['field_1', 'user_location', 'text', 'user_followers', 'locations',
       'latitude', 'longitude', 'geometry', 'polarity', 'polarity_label',
       'subjectivity', 'subjectivity_label'],
      dtype='object')

Choose columns necessary for map visualization

In [146]:
bitcoin_tweet3= bitcoin_tweet[['geometry', 'subjectivity', 'subjectivity_label']]

Map Visualisation of Subjectivity¶

Plotting subjectivity values

In [147]:
fig, ax = plt.subplots(figsize=(20,16))
earth.plot(ax=ax, color= "grey")
bitcoin_tweet3.plot(ax=ax, column='subjectivity', legend=True, legend_kwds={"label": 'Subjectivity Spread of Bitcoin Tweets', "orientation":"horizontal"}, cmap='Set1')
plt.show()
No description has been provided for this image

Plotting subjectivity values with interactive map using geopandas .explore()

In [148]:
bitcoin_tweet3.explore(column='subjectivity',   # make choropleth based on "BoroName" column
    tooltip=['subjectivity'],  # show "BoroName" value in tooltip (on hover)
    popup=True,  # show all values in popup (on click)
    tiles="openstreetmap",  # use "CartoDB positron" tiles
    cmap="Set1",  # use "Set1" matplotlib colormap
    legend=True,  
    marker_kwds=dict(radius=5,icon=folium.Icon(icon='house-blank')),                    
    style_kwds=dict(color="black"), # use black outline
    zoom_control=True,
    #zoom_start = 1,
        )
Out[148]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Plotting subjectivity label

In [149]:
fig, ax = plt.subplots(figsize=(20,16))
earth.plot(ax=ax, color= "grey")
bitcoin_tweet3.plot(ax=ax, column='subjectivity_label', legend=True,  cmap='viridis')
plt.show()
No description has been provided for this image

Plotting subjectivity label with interactive map using geopandas .explore()

In [150]:
bitcoin_tweet3.explore(column='subjectivity_label',   # make choropleth based on "BoroName" column
    tooltip=['subjectivity_label'],  # show "BoroName" value in tooltip (on hover)
    popup=True,  # show all values in popup (on click)
    tiles="openstreetmap",  # use "CartoDB positron" tiles
    cmap="Set1",  # use "Set1" matplotlib colormap
    legend=True,  
    marker_kwds=dict(radius=5,icon=folium.Icon(icon='house-blank')),                    
    style_kwds=dict(color="black"), # use black outline
    zoom_control=True,
    #zoom_start = 1,
                      )
Out[150]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Reading a GeoDataFrame from a DataFrame with coordinates

A GeoDataFrame needs a shapely object. We use geopandas points_from_xy() to transform Longitude and Latitude into a list of shapely.Point objects and set it as a geometry while creating the GeoDataFrame. (note that points_from_xy() is an enhanced wrapper for [Point(x, y) for x, y in zip(df.Longitude, df.Latitude)]). The crs value is also set to explicitly state the geometry data defines latitude/ longitude world geodetic degree values. This is important for the correct interpretation of the data, such as when plotting with data in other formats.

Now we interpret the sentiment based on the compound score. If the compound score is greater than or equal to 0.05, it's considered positive. If it's less than or equal to -0.05, it's considered negative. Otherwise, it's considered neutral. (Oluyale, 2023)

Oluyale, D. (2023, September 16). Sentiment Analysis using various options in Python Machine Learning. Medium. https://medium.com/@oluyaled/sentiment-analysis-using-various-options-in-python-machine-learning-aaa24ea0991c

Task 2.5 Storify/Interpretation¶

In this task, use your geospatial data analytical skill to storify (in not more than 500 words) the results obtained in the preceding two tasks. Imagine yourself as a policy advisor to the UK government whose job is to update about the public sentiment related to cryptocurrency across different parts of the world. You may try to answer some of these example questions – How is the public opinion about cryptocurrency? Which locations have positive views about this issue and where can you see a vast amount of negativity? Despite having positive/negative/mixed sentiment about cryptocurrency, will you take these tweets very seriously (HINT: if the tweet originates from outside the UK, then it may not affect the government policies!)? Are the messages loud and clear? Please note that these are only suggestive questions. You are strongly recommended to not constrain your sentiment analytical skills

Storification task would require more codes and visualization

In [ ]:
 
In [151]:
print(bitcoin_tweet['locations'].value_counts().to_markdown())
| locations                                           |   count |
|:----------------------------------------------------|--------:|
| Bay Area, CA                                        |      19 |
| London, England                                     |      15 |
| United States                                       |      13 |
| Global                                              |      10 |
| New York, NY                                        |      10 |
| Australia                                           |       8 |
| Matter Doesn't Matter                               |       6 |
| Moon                                                |       6 |
| New York                                            |       6 |
| Earth                                               |       6 |
| The Moon                                            |       5 |
| England, United Kingdom                             |       5 |
| United Kingdom                                      |       5 |
| Blockchain                                          |       5 |
| India                                               |       5 |
| Birmingham, England                                 |       4 |
| Worldwide                                           |       4 |
| Europe                                              |       4 |
| California, USA                                     |       4 |
| Florida, USA                                        |       4 |
| New York, USA                                       |       4 |
| Mars                                                |       4 |
| Trade here 👉                                     |       4 |
| Singapore                                           |       4 |
| UK                                                  |       4 |
| Pakistan                                            |       3 |
| Miami, FL                                           |       3 |
| Chicago, IL                                         |       3 |
| Paris, FR                                           |       3 |
| Canada                                              |       3 |
| Bitcoin                                             |       3 |
| London                                              |       3 |
| Islamabad, Pakistan                                 |       3 |
| USA                                                 |       3 |
| World                                               |       3 |
| Kolkata, India                                      |       2 |
| Sankt-Peterburg                                     |       2 |
| Türkiye                                            |       2 |
| New York City                                       |       2 |
| San Diego, CA                                       |       2 |
| London, UK                                          |       2 |
| To the Moon                                         |       2 |
| Bangladesh                                          |       2 |
| Internet                                            |       2 |
| Houston, TX                                         |       2 |
| Paris, France                                       |       2 |
| Tennessee                                           |       2 |
| Austin                                              |       2 |
| tesvikiye                                           |       2 |
| America                                             |       2 |
| Bellevue, WA                                        |       2 |
| Spain                                               |       2 |
| Buffalo, NY                                         |       2 |
| São Paulo, Brasil 🇧🇷                         |       2 |
| London, England 🇬🇧                            |       2 |
| Dhaka, Bangladesh                                   |       2 |
| Minnesota                                           |       2 |
| on an island 🇨🇦                               |       2 |
| Manhattan, NY                                       |       2 |
| Rotterdam, Nederland                                |       2 |
| Nova Scotia, Canada                                 |       2 |
| Everywhere                                          |       2 |
| Hong Kong                                           |       2 |
| South Africa                                        |       2 |
| Crypto World                                        |       2 |
| Johannesburg, South Africa                          |       2 |
| Jersey City, New Jersey                             |       2 |
| Sydney, New South Wales                             |       2 |
| Toronto, Ontario                                    |       2 |
| Estados Unidos                                      |       2 |
| Kansas City, MO                                     |       2 |
| somewhere                                           |       2 |
| Philadelphia, PA                                    |       2 |
| Metaverse                                           |       1 |
| 👇YouTube Channel👇                             |       1 |
| Paris                                               |       1 |
| Parts Unknown                                       |       1 |
| भारत                                        |       1 |
| 127.0.0.1                                           |       1 |
| Islamic Republic of Iran                            |       1 |
| Metz / Paris, FRANCE                                |       1 |
| United States Of American                           |       1 |
| El Salvador                                         |       1 |
| Penzance, Cornwall                                  |       1 |
| Western Europe                                      |       1 |
| east Godavari, Andhra Pradesh                       |       1 |
| New Delhi, India                                    |       1 |
| Cambridge, MA                                       |       1 |
| Galt’s Gulch                                      |       1 |
| Orange County, California                           |       1 |
| Near You                                            |       1 |
| Ukraine                                             |       1 |
| Milan, Lombardy                                     |       1 |
| Cuddapah, India                                     |       1 |
| Bali, Indonesia                                     |       1 |
| DE 🇩🇪 & SA🇿🇦                            |       1 |
| Santa Monica, CA                                    |       1 |
| SPORTS BETTING (THE CASINO)                         |       1 |
| Grindelwald, Schweiz                                |       1 |
| Jawa Tengah, Indonesia                              |       1 |
| Traveler..                                          |       1 |
| Texas, USA                                          |       1 |
| Raigarh, India                                      |       1 |
| New York, NC                                        |       1 |
| U.S.A!                                              |       1 |
| Northern California                                 |       1 |
| b'd0bad0b0d0b7d0b0d185d181d182d0b0d0bd'             |       1 |
| Lausanne (Switzerland)                              |       1 |
| Worldwide                                           |       1 |
| Nouvelle calédonie                                 |       1 |
| DKI Jakarta, Indonesia                              |       1 |
| Copenhagen                                          |       1 |
| Bandung, Jawa Barat                                 |       1 |
| Los Angeles                                         |       1 |
| Morocco                                             |       1 |
| Russia                                              |       1 |
| Wayne, PA                                           |       1 |
| Dublin, Ireland                                     |       1 |
| Loyalsock Township, PA                              |       1 |
| Chile                                               |       1 |
| EARTH                                               |       1 |
| Brisbane                                            |       1 |
| East Midlands, England                              |       1 |
| Brookline, MA                                       |       1 |
| Faridpur, Bangladesh                                |       1 |
| Ciudad Real, Spain                                  |       1 |
| Washington, DC                                      |       1 |
| INDONESIA                                           |       1 |
| Europa                                              |       1 |
| Massachusetts                                       |       1 |
| Toronto                                             |       1 |
| Uranus                                              |       1 |
| South Korea                                         |       1 |
| Down the Rabbit Hole                                |       1 |
| Moon, PA                                            |       1 |
| Glasgow, Scotland                                   |       1 |
| London, United Kingdom                              |       1 |
| Sukabumi, Indonesia                                 |       1 |
| Anaheim, CA                                         |       1 |
| The Netherlands                                     |       1 |
| Jaipur Rajasthan                                    |       1 |
| Seattle, WA                                         |       1 |
| Detroit Michigan                                    |       1 |
| Cincinnati, OH                                      |       1 |
| Space Mountain                                      |       1 |
| Melbourne, Victoria                                 |       1 |
| Telegram                                            |       1 |
| Road Warrior                                        |       1 |
| ¯\_(ツ)_/¯                                       |       1 |
| Germany                                             |       1 |
| Ann Arbor                                           |       1 |
| Everywhere 🗺                                     |       1 |
| Tamil Nadu                                          |       1 |
| Hyderabad, India                                    |       1 |
| #blockchain                                         |       1 |
| Montréal, Québec                                  |       1 |
| British Columbia, Canada                            |       1 |
| Florida                                             |       1 |
| Sumatera Selatan, Indonesia                         |       1 |
| Columbia, SC                                        |       1 |
| Little Rock, AR                                     |       1 |
| Istanbul, Turkey                                    |       1 |
| Tucson, AZ                                          |       1 |
| South Park, CO                                      |       1 |
| Los Angeles, CA                                     |       1 |
| 420 Wall St, NY                                     |       1 |
| Spratly Islands                                     |       1 |
| Malang, Jawa Timur                                  |       1 |
| Ondo, Nigeria                                       |       1 |
| Texas                                               |       1 |
| Curaçao                                            |       1 |
| Nigeria                                             |       1 |
| Oslo, Norway                                        |       1 |
| Irving, Tx                                          |       1 |
| World Wide                                          |       1 |
| earth                                               |       1 |
| Tegal                                               |       1 |
| Santa Fe, New Mexico                                |       1 |
| Sydney, Australia                                   |       1 |
| Bangkok, Thailand                                   |       1 |
| Jakarta, Indonesia                                  |       1 |
| Fairfax, VA                                         |       1 |
| Strazburg, France                                   |       1 |
| Los Angeles, California, USA                        |       1 |
| in search                                           |       1 |
| bella ciao                                          |       1 |
| Barcelona / Bangkok                                 |       1 |
| International Space Station 🚀                    |       1 |
| Larissa, GR                                         |       1 |
| South East, England                                 |       1 |
| Switzerland                                         |       1 |
| Maui, Hawaii                                        |       1 |
| Shanghai, China                                     |       1 |
| Milan, Italy                                        |       1 |
| Oklahoma City, OK                                   |       1 |
| Busan, Republic of Korea                            |       1 |
| West Dhanmondi, Dhaka                               |       1 |
| Bumi Nusantara                                      |       1 |
| Austin, TX                                          |       1 |
| Brussel, België                                    |       1 |
| Dallas, TX                                          |       1 |
| Kyiv, Ukraine                                       |       1 |
| Rangpur, Bangladesh                                 |       1 |
| Glasgow, UK                                         |       1 |
| Zug | Berlin                                        |       1 |
| Landgraaf, Nederland                                |       1 |
| here                                                |       1 |
| Argentina                                           |       1 |
| Washington D.C                                      |       1 |
| Shambhala                                           |       1 |
| Planet Earth                                        |       1 |
| American Fork, UT                                   |       1 |
| Salt Lake City, Utah                                |       1 |
| Tehran                                              |       1 |
| Garmany                                             |       1 |
| Utah                                                |       1 |
| Internationalist                                    |       1 |
| Moon                                                |       1 |
| 대한민국 안산시                              |       1 |
| Pekanbaru, Riau                                     |       1 |
| united states                                       |       1 |
| Alger                                               |       1 |
| Global                                              |       1 |
| Meta                                                |       1 |
| Mother Earth                                        |       1 |
| Punjab, Pakistan                                    |       1 |
| Bareilly, India                                     |       1 |
| @Moon                                               |       1 |
| Turkey                                              |       1 |
| Boston, MA                                          |       1 |
| California, USA🇺🇸                             |       1 |
| San Tan Valley, AZ                                  |       1 |
| الولايات المتحدة الأمريكية  |       1 |
| Samsun, Türkiye                                    |       1 |
| Future                                              |       1 |
| San Mateo, CA                                       |       1 |
| Westland, MI                                        |       1 |
| Miami Florida                                       |       1 |
| Mewn                                                |       1 |
| Guess                                               |       1 |
| Burger-galaxy                                       |       1 |
| Ireland                                             |       1 |
| Pace, FL                                            |       1 |
| Watford, Hertfordshire, UK                          |       1 |
| Malaysia                                            |       1 |
| Rio de Janeiro, Brazil                              |       1 |
| Lewes, DE                                           |       1 |
| Between Here & There                                |       1 |
| Sydney                                              |       1 |
| Bay Area, California                                |       1 |
| Mumbai                                              |       1 |
| Dubai, United Arab Emirates                         |       1 |
| Gazipur, Dhaka                                      |       1 |
| Kansas City                                         |       1 |
| Tucuman, Argentina                                  |       1 |
| Ohio, USA                                           |       1 |
| België                                             |       1 |
| Washington state                                    |       1 |
| Jackson,Mississippi,USA                             |       1 |
| Antwerp                                             |       1 |
| Englewood Cliffs, NJ                                |       1 |
| India                                               |       1 |
| West Bengal, India                                  |       1 |
| İstanbul, Türkiye                                 |       1 |
| Mumbai, India                                       |       1 |
| Ahmedabad City, India                               |       1 |
| #الكرة_الارضية                          |       1 |
| Amsterdam, Nederland                                |       1 |
| Wellington City, New Zealand                        |       1 |
| Ankara, Türkiye                                    |       1 |
| South West, England                                 |       1 |
| halp, USA                                           |       1 |
| Durham, NC                                          |       1 |
| Texas                                               |       1 |
| Greenville, SC                                      |       1 |
| india                                               |       1 |
| Wyoming                                             |       1 |
| AZ/NH/MA                                            |       1 |
| THE MOON                                            |       1 |
| Out and About                                       |       1 |
| Space                                               |       1 |
| City of London, London                              |       1 |
| Windsor, Ontario                                    |       1 |
| b'536f7574682041667269636120f09f87bff09f87a6e29da4' |       1 |
| Hamburg, Deutschland                                |       1 |
| Nova Friburgo, Brasil                               |       1 |
| Gas Fields, Louisiana                               |       1 |
| now                                                 |       1 |
| Lagos                                               |       1 |
| New Orleans                                         |       1 |
| Jaderberg                                           |       1 |
| Chattogram , Bangladesh                             |       1 |
| Michigan, USA                                       |       1 |
| 3rd rock from the sun                               |       1 |
| Islington, London                                   |       1 |
| Bogor                                               |       1 |
| Fort Worth, Texas                                   |       1 |
| New York & Taipei                                   |       1 |
| Venezuela                                           |       1 |
| Indiana, USA                                        |       1 |
| Sri lanka                                           |       1 |
| Lisbon, Portugal                                    |       1 |
| Laguna Beach, CA                                    |       1 |
In [152]:
bitcoin_tweet[:3]
Out[152]:
field_1 user_location text user_followers locations latitude longitude geometry polarity polarity_label subjectivity subjectivity_label
0 714437 The Moon, Sylacauga, Talladega County, Alabama... talked to about stacks stx and hopes the... 39 The Moon 33.1584497 -86.2385846 POINT (-86.23858 33.15845) -0.125000 Negative 0.375000 Factual
1 280261 Lisboa, Portugal i m so f tired of ignorant american phd econom... 1614 Lisbon, Portugal 38.7077507 -9.1365919 POINT (-9.13659 38.70775) -0.133333 Negative 0.233333 Factual
2 336653 Muhu, Saare maakond, Eesti long bitcoin 43 Moon 58.5959044 23.21964608602439 POINT (23.21965 58.59590) -0.050000 Negative 0.400000 Factual
In [153]:
bri= bitcoin_tweet[(bitcoin_tweet["locations"]== 'London, England')] 
                   
In [154]:
bri
Out[154]:
field_1 user_location text user_followers locations latitude longitude geometry polarity polarity_label subjectivity subjectivity_label
20 419791 London, Greater London, England, United Kingdom thanks for giving us such a great opportunit... 129 London, England 51.5074456 -0.1277653 POINT (-0.12777 51.50745) 3.100000e-01 Positive 0.340000 Factual
54 295173 London, Greater London, England, United Kingdom bitcoin fear index on google search statistics... 202 London, England 51.5074456 -0.1277653 POINT (-0.12777 51.50745) 3.750000e-01 Positive 0.500000 Neutral
103 820864 London, Greater London, England, United Kingdom no trade btc price is 30888 at time 08 07 21... 1523 London, England 51.5074456 -0.1277653 POINT (-0.12777 51.50745) 0.000000e+00 Neutral 0.000000 Factual
112 505680 London, Greater London, England, United Kingdom grayscale pairs with coindesk index to launch ... 62 London, England 51.5074456 -0.1277653 POINT (-0.12777 51.50745) 0.000000e+00 Neutral 0.000000 Factual
117 271016 London, Greater London, England, United Kingdom it was good while it lasted now it is time ... 129 London, England 51.5074456 -0.1277653 POINT (-0.12777 51.50745) 3.500000e-01 Positive 0.300000 Factual
234 513485 London, Greater London, England, United Kingdom grayscale investments launches defi fund now... 62 London, England 51.5074456 -0.1277653 POINT (-0.12777 51.50745) 1.000000e-01 Positive 0.000000 Factual
264 520206 London, Greater London, England, United Kingdom ready for elon to say some stupid shit trigger... 63 London, England 51.5074456 -0.1277653 POINT (-0.12777 51.50745) -2.666667e-01 Negative 0.766667 subjective
322 824028 London, Greater London, England, United Kingdom let the fifth and final wave commence btc bi... 7 London, England 51.5074456 -0.1277653 POINT (-0.12777 51.50745) 0.000000e+00 Neutral 1.000000 subjective
362 568086 London, Greater London, England, United Kingdom all details on telegram channel entry tar... 119 London, England 51.5074456 -0.1277653 POINT (-0.12777 51.50745) 0.000000e+00 Neutral 0.000000 Factual
377 663473 London, Greater London, England, United Kingdom the first ever mode post agm investor present... 788 London, England 51.5074456 -0.1277653 POINT (-0.12777 51.50745) 3.250000e-01 Positive 0.366667 Factual
378 272275 London, Greater London, England, United Kingdom open spot signals 10 ada rlc ... 78 London, England 51.5074456 -0.1277653 POINT (-0.12777 51.50745) 4.000000e-01 Positive 0.450000 Factual
386 578083 London, Greater London, England, United Kingdom the fiat money experiment is failing bitcoin 18 London, England 51.5074456 -0.1277653 POINT (-0.12777 51.50745) 0.000000e+00 Neutral 0.000000 Factual
398 624101 London, Greater London, England, United Kingdom huge bitcoin inflow to gemini behind the drop ... 8249 London, England 51.5074456 -0.1277653 POINT (-0.12777 51.50745) 2.775558e-17 Positive 0.800000 subjective
436 558763 London, Greater London, England, United Kingdom we don t buy bitcoin due to the new unique... 851 London, England 51.5074456 -0.1277653 POINT (-0.12777 51.50745) 2.385281e-01 Positive 0.594697 subjective
466 130805 London, Greater London, England, United Kingdom elsalvador becomes the first country to make ... 469 London, England 51.5074456 -0.1277653 POINT (-0.12777 51.50745) 2.250000e-01 Positive 0.266667 Factual
In [155]:
bri2 = bitcoin_tweet[(bitcoin_tweet["locations"]== 'United Kingdom')]
In [156]:
bri2
Out[156]:
field_1 user_location text user_followers locations latitude longitude geometry polarity polarity_label subjectivity subjectivity_label
127 349546 United Kingdom what would the government do if the youth of t... 261 United Kingdom 54.7023545 -3.2765753 POINT (-3.27658 54.70235) 0.0 Neutral 0.0 Factual
137 531787 United Kingdom swedish man sentenced for gold backed cryptocu... 173 United Kingdom 54.7023545 -3.2765753 POINT (-3.27658 54.70235) 0.0 Neutral 0.0 Factual
376 498148 United Kingdom privacy focused crypto is launching an inc... 5840 United Kingdom 54.7023545 -3.2765753 POINT (-3.27658 54.70235) 0.0 Neutral 0.0 Factual
411 556330 United Kingdom it s all profit cryptocurrency made me a milli... 2332 United Kingdom 54.7023545 -3.2765753 POINT (-3.27658 54.70235) 0.0 Neutral 0.0 Factual
460 412787 United Kingdom buy bitcoin and hold newhigh 12 United Kingdom 54.7023545 -3.2765753 POINT (-3.27658 54.70235) 0.0 Neutral 0.0 Factual
In [157]:
br3 = bitcoin_tweet[(bitcoin_tweet["locations"]== 'London')]
In [158]:
br3
Out[158]:
field_1 user_location text user_followers locations latitude longitude geometry polarity polarity_label subjectivity subjectivity_label
36 891204 London, Greater London, England, United Kingdom good traders vs bad traders a bad trade on th... 1463 London 51.5074456 -0.1277653 POINT (-0.12777 51.50745) -0.217857 Negative 0.539286 subjective
256 157176 London, Greater London, England, United Kingdom you still tweeting about bitcoin wait u... 43 London 51.5074456 -0.1277653 POINT (-0.12777 51.50745) 0.142857 Positive 0.267857 Factual
475 367564 London, Greater London, England, United Kingdom so after a very long wait tomorrow is the sta... 76 London 51.5074456 -0.1277653 POINT (-0.12777 51.50745) 0.267500 Positive 0.710000 subjective
In [159]:
br4 = bitcoin_tweet[(bitcoin_tweet["locations"]== 'London, England 🇬🇧')]
br4
Out[159]:
field_1 user_location text user_followers locations latitude longitude geometry polarity polarity_label subjectivity subjectivity_label
76 663255 London, Greater London, England, United Kingdom i m encouraged by this morning s senate bankin... 4285 London, England 🇬🇧 51.5074456 -0.1277653 POINT (-0.12777 51.50745) 0.0 Neutral 0.0 Factual
433 438568 London, Greater London, England, United Kingdom ta bitcoin trim losses why bulls need to ove... 4299 London, England 🇬🇧 51.5074456 -0.1277653 POINT (-0.12777 51.50745) 0.5 Positive 0.5 Neutral
In [160]:
br5 = bitcoin_tweet[(bitcoin_tweet["locations"]== 'England, United Kingdom')]
br5
Out[160]:
field_1 user_location text user_followers locations latitude longitude geometry polarity polarity_label subjectivity subjectivity_label
123 890689 England, United Kingdom live bitcoin price 46 731 an increase ... 39938 England, United Kingdom 52.5310214 -1.2649062 POINT (-1.26491 52.53102) 0.068182 Positive 0.283333 Factual
159 739840 England, United Kingdom fair launching today 6pm utc do not miss... 320 England, United Kingdom 52.5310214 -1.2649062 POINT (-1.26491 52.53102) 0.291667 Positive 0.550000 subjective
196 228331 England, United Kingdom bitcoin wow some people woke up thinking thi... 7 England, United Kingdom 52.5310214 -1.2649062 POINT (-1.26491 52.53102) 0.141667 Positive 0.741667 subjective
423 293677 England, United Kingdom live bitcoin price 35 494 an increase ... 39946 England, United Kingdom 52.5310214 -1.2649062 POINT (-1.26491 52.53102) 0.068182 Positive 0.283333 Factual
487 423690 England, United Kingdom moving up slowly inertia creeps moving up slo... 5 England, United Kingdom 52.5310214 -1.2649062 POINT (-1.26491 52.53102) -0.300000 Negative 0.400000 Factual

Concatenate the dataframes on rows and call it bitcoin_tweetUK

In [161]:
bitcoin_tweetUK = pd.concat([bri, bri2, br3, br4, br5], axis=0)
In [162]:
bitcoin_tweetUK
Out[162]:
field_1 user_location text user_followers locations latitude longitude geometry polarity polarity_label subjectivity subjectivity_label
20 419791 London, Greater London, England, United Kingdom thanks for giving us such a great opportunit... 129 London, England 51.5074456 -0.1277653 POINT (-0.12777 51.50745) 3.100000e-01 Positive 0.340000 Factual
54 295173 London, Greater London, England, United Kingdom bitcoin fear index on google search statistics... 202 London, England 51.5074456 -0.1277653 POINT (-0.12777 51.50745) 3.750000e-01 Positive 0.500000 Neutral
103 820864 London, Greater London, England, United Kingdom no trade btc price is 30888 at time 08 07 21... 1523 London, England 51.5074456 -0.1277653 POINT (-0.12777 51.50745) 0.000000e+00 Neutral 0.000000 Factual
112 505680 London, Greater London, England, United Kingdom grayscale pairs with coindesk index to launch ... 62 London, England 51.5074456 -0.1277653 POINT (-0.12777 51.50745) 0.000000e+00 Neutral 0.000000 Factual
117 271016 London, Greater London, England, United Kingdom it was good while it lasted now it is time ... 129 London, England 51.5074456 -0.1277653 POINT (-0.12777 51.50745) 3.500000e-01 Positive 0.300000 Factual
234 513485 London, Greater London, England, United Kingdom grayscale investments launches defi fund now... 62 London, England 51.5074456 -0.1277653 POINT (-0.12777 51.50745) 1.000000e-01 Positive 0.000000 Factual
264 520206 London, Greater London, England, United Kingdom ready for elon to say some stupid shit trigger... 63 London, England 51.5074456 -0.1277653 POINT (-0.12777 51.50745) -2.666667e-01 Negative 0.766667 subjective
322 824028 London, Greater London, England, United Kingdom let the fifth and final wave commence btc bi... 7 London, England 51.5074456 -0.1277653 POINT (-0.12777 51.50745) 0.000000e+00 Neutral 1.000000 subjective
362 568086 London, Greater London, England, United Kingdom all details on telegram channel entry tar... 119 London, England 51.5074456 -0.1277653 POINT (-0.12777 51.50745) 0.000000e+00 Neutral 0.000000 Factual
377 663473 London, Greater London, England, United Kingdom the first ever mode post agm investor present... 788 London, England 51.5074456 -0.1277653 POINT (-0.12777 51.50745) 3.250000e-01 Positive 0.366667 Factual
378 272275 London, Greater London, England, United Kingdom open spot signals 10 ada rlc ... 78 London, England 51.5074456 -0.1277653 POINT (-0.12777 51.50745) 4.000000e-01 Positive 0.450000 Factual
386 578083 London, Greater London, England, United Kingdom the fiat money experiment is failing bitcoin 18 London, England 51.5074456 -0.1277653 POINT (-0.12777 51.50745) 0.000000e+00 Neutral 0.000000 Factual
398 624101 London, Greater London, England, United Kingdom huge bitcoin inflow to gemini behind the drop ... 8249 London, England 51.5074456 -0.1277653 POINT (-0.12777 51.50745) 2.775558e-17 Positive 0.800000 subjective
436 558763 London, Greater London, England, United Kingdom we don t buy bitcoin due to the new unique... 851 London, England 51.5074456 -0.1277653 POINT (-0.12777 51.50745) 2.385281e-01 Positive 0.594697 subjective
466 130805 London, Greater London, England, United Kingdom elsalvador becomes the first country to make ... 469 London, England 51.5074456 -0.1277653 POINT (-0.12777 51.50745) 2.250000e-01 Positive 0.266667 Factual
127 349546 United Kingdom what would the government do if the youth of t... 261 United Kingdom 54.7023545 -3.2765753 POINT (-3.27658 54.70235) 0.000000e+00 Neutral 0.000000 Factual
137 531787 United Kingdom swedish man sentenced for gold backed cryptocu... 173 United Kingdom 54.7023545 -3.2765753 POINT (-3.27658 54.70235) 0.000000e+00 Neutral 0.000000 Factual
376 498148 United Kingdom privacy focused crypto is launching an inc... 5840 United Kingdom 54.7023545 -3.2765753 POINT (-3.27658 54.70235) 0.000000e+00 Neutral 0.000000 Factual
411 556330 United Kingdom it s all profit cryptocurrency made me a milli... 2332 United Kingdom 54.7023545 -3.2765753 POINT (-3.27658 54.70235) 0.000000e+00 Neutral 0.000000 Factual
460 412787 United Kingdom buy bitcoin and hold newhigh 12 United Kingdom 54.7023545 -3.2765753 POINT (-3.27658 54.70235) 0.000000e+00 Neutral 0.000000 Factual
36 891204 London, Greater London, England, United Kingdom good traders vs bad traders a bad trade on th... 1463 London 51.5074456 -0.1277653 POINT (-0.12777 51.50745) -2.178571e-01 Negative 0.539286 subjective
256 157176 London, Greater London, England, United Kingdom you still tweeting about bitcoin wait u... 43 London 51.5074456 -0.1277653 POINT (-0.12777 51.50745) 1.428571e-01 Positive 0.267857 Factual
475 367564 London, Greater London, England, United Kingdom so after a very long wait tomorrow is the sta... 76 London 51.5074456 -0.1277653 POINT (-0.12777 51.50745) 2.675000e-01 Positive 0.710000 subjective
76 663255 London, Greater London, England, United Kingdom i m encouraged by this morning s senate bankin... 4285 London, England 🇬🇧 51.5074456 -0.1277653 POINT (-0.12777 51.50745) 0.000000e+00 Neutral 0.000000 Factual
433 438568 London, Greater London, England, United Kingdom ta bitcoin trim losses why bulls need to ove... 4299 London, England 🇬🇧 51.5074456 -0.1277653 POINT (-0.12777 51.50745) 5.000000e-01 Positive 0.500000 Neutral
123 890689 England, United Kingdom live bitcoin price 46 731 an increase ... 39938 England, United Kingdom 52.5310214 -1.2649062 POINT (-1.26491 52.53102) 6.818182e-02 Positive 0.283333 Factual
159 739840 England, United Kingdom fair launching today 6pm utc do not miss... 320 England, United Kingdom 52.5310214 -1.2649062 POINT (-1.26491 52.53102) 2.916667e-01 Positive 0.550000 subjective
196 228331 England, United Kingdom bitcoin wow some people woke up thinking thi... 7 England, United Kingdom 52.5310214 -1.2649062 POINT (-1.26491 52.53102) 1.416667e-01 Positive 0.741667 subjective
423 293677 England, United Kingdom live bitcoin price 35 494 an increase ... 39946 England, United Kingdom 52.5310214 -1.2649062 POINT (-1.26491 52.53102) 6.818182e-02 Positive 0.283333 Factual
487 423690 England, United Kingdom moving up slowly inertia creeps moving up slo... 5 England, United Kingdom 52.5310214 -1.2649062 POINT (-1.26491 52.53102) -3.000000e-01 Negative 0.400000 Factual
In [163]:
bitcoin_tweetUK.shape
Out[163]:
(30, 12)

I will get the polarity Label Count with respect to UK

In [164]:
plt.figure(figsize=(12,4))
sns.countplot(x='polarity_label', data=bitcoin_tweetUK)
plt.title('Polarity Label Count for UK')
plt.show()
No description has been provided for this image

Subjectivity Count for UK

In [165]:
plt.figure(figsize=(12,4))
sns.countplot(x='subjectivity_label', data=bitcoin_tweetUK)
plt.title('Subjectivity Label Count for UK')
plt.show()
No description has been provided for this image

Storify/Interpretation¶

Introduction

Bitcoin is part of a peer-to-peer network called cryptocurrency. Users can exchange value digitally without the intervention of a third party by using Bitcoin. The concept of Bitcoin relies on the theory of resolving cryptographic algorithms to produce distinct hashes with a limited quantity. This work is set out to review sentiments on Bitcoin.

My Review Findings:

  1. The result of my analysis shows that there is a general awareness of Bitcoin all over the world.

  2. The world is favourably disposed to Bitcoin; only about 60 tweets of the sampled 500 tweets expressed negative sentiment about Bitcoin.

  3. About 230 tweets expressed positive sentiments, and 210 tweets were neutral.

  4. I have also attempted to understand the degree of negativity expressed; about 80% expressed very low negativity, e.g., between -0.1 and -0.09.

  5. The sample from the UK also showed that only about 2 tweets of the sampled 30 tweets expressed negative sentiments about Bitcoin. The degree of negativity was again low.

  6. 16 tweets expressed positive sentiments, and 12 tweets were neutral.

  7. I attempted to understand if these sentiments were factual or subjective.

  8. The tweets were overwhelmingly factual; only about 130 tweets were subjective worldwide.

  9. In the United Kingdom, only nine tweets were subjective.

  10. Given the fact that the overwhelming tweets were factual in the UK and all over the world, we can 100% take this analysis seriously.

  11. One negative tweet in the UK came from the Brighton area around Bexhill-on-Sea. The other negative tweet came from Leicester, an area around Hinckley. The positive tweets are evenly spread all over the UK.

  12. Among the neighbouring countries to the UK, the Republic of Ireland does not have a negative tweet about Bitcoin. France and the Netherlands have one negative comment each.

Conclusion and Policy Implications

I have been able to present the sentiment polarity and subjectivity from the United Kingdom and all over the world on Bitcoin. I have shown that Bitcoin tweets in the UK and around the world are overwhelmingly positively polarised. The sentiments expressed are overwhelmingly factual. It therefore means that these tweets can be trusted and could form the bedrock of policy implementation. I hereby suggest that:

  1. The UK government could sponsor research on the benefits of Bitcoin and cryptocurrencies to the growth of its economy.

  2. Bitcoin could be regarded as a digital legal tender.

  3. Universities in the UK should be encouraged to take Bitcoin, crypto-currency, and cryptography as a course of study.

  4. Scholarships and grants should be given to students who wish to study bitcoin and its mechanism of cryptography.

  5. Research should be done to make Bitcoin a more secure digital currency. The negative sentiments about Bitcoin were around the fear of its security.

  6. There is also a fear of the fact that there is no central ownership of Bitcoin. I will advise the government to set up control bodies for Bitcoin to outweigh these fears.

References¶

Crickard, P., (2018). Mastering geospatial analysis with Python: Explore GIS processing and learn to work with GeoDjango, CARTOframes and MapboxGL Jupyter. Packt Publishing. Link Gabby, A. (2023). Python Sentiment Analysis using TextBlob and VADER for Glassdoor Reviews. Medium. https://medium.com/@gabya06/python-sentiment-analysis-using-textblob-and-vader-for-glassdoor-reviews-cc9632babb73

GeoPandas Documentation (2023). Geopandas.GeoDataFrame.explore — GeoPandas Docs. Available at: https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoDataFrame.explore.html.

GeoPandas Documentation (2023). Creating a GeoDataFrame from a DataFrame with coordinates— GeoPandas Docs. Available at: Creating a GeoDataFrame from a DataFrame with coordinates — GeoPandas 0+untagged.50.g9a9f097.dirty documentation

Deparkes (2016). Folium Map Tiles - deparkes. Available at: https://deparkes.co.uk/2016/06/10/folium-map-tiles/.

Intelligent Economist, (2020). Malthusian Theory Of Population - Intelligent Economist. Available at: https://www.intelligenteconomist.com/malthusian-theory/

Muhammad, U. S. (2023). A Comparison of NLTK and TextBlob for Text Analysis. Medium. https://medium.com/@umarsmuhammed/a-comparison-of-nltk-and-textblob-for-text-analysis-bd9ebcd0ecd9

Oluyale, D. (2023, September 16). Sentiment Analysis using various options in Python Machine Learning. Medium. https://medium.com/@oluyaled/sentiment-analysis-using-various-options-in-python-machine-learning-aaa24ea0991c

re — Regular expression operations. (2023). Python Documentation. Aavailable at: https://docs.python.org/3/library/re.html Accessed: 29/11/2023 TextBlob: Simplified Text Processing — TextBlob 0.16.0 documentation. (2023). TextBlob: Simplified Text Processing — TextBlob 0.16.0 Documentation. https://textblob.readthedocs.io/en/dev/

In [ ]: