Mapping and Analyzing Geospatial Trends: A Python Approach
By
Imonikhe Ayeni
---INTRODUCTION
This project, titled "Mapping and Analyzing Geospatial Trends: A Python Approach" explores the integration of geospatial data visualization and sentiment analysis using Python. The work is organized into three distinct tasks, each focusing on a specific aspect of geospatial data handling:
- Geospatial Visualization:
This task demonstrates the use of Python-based geospatial visualization tools, specifically GeoPandas, to analyze and present real-world datasets. GeoPandas facilitates the creation of both static and interactive maps, providing a comprehensive view of the geospatial information.
- Geospatial Data Analysis:
This task delves into professional methods for analyzing geospatial datasets. It utilizes the World Total Population dataset from the World Bank and the World Cereal Yield dataset. The objective is to identify significant relationships between the population of different countries and their cereal yield, showcasing the potential correlations and insights that can be derived from such analyses.
- Geospatial Sentiment Analysis:
In this task, geospatial sentiment analysis is applied to Twitter (now X) data using the Python library, TextBlob. The dataset comprises tweets relevant to cryptocurrency. This task aims to extract and map sentiments expressed in these tweets, highlighting the geographical distribution of opinions and trends in the cryptocurrency domain.
OBJECTIVES
The objectives of this project are as follows:
- Identify the concepts Underlying Geospatial Analysis:
Identify the concepts underlying geospatial analysis through the application of Python-based tools and libraries.
- Apply Social Analytics Techniques:
Implement appropriate techniques to analyze social information, particularly through the sentiment analysis of Twitter data related to cryptocurrency.
- Design and Implement a Geospatial Analysis Framework:
Develop, prototype, and execute a comprehensive framework for geospatial analysis, integrating various datasets and visualization methods to derive meaningful insights.
- Identify Relationships Between Geospatial Datasets:
Investigate and identify significant relationships between different geospatial datasets, such as the correlation between the population of different countries and their cereal yield.
- Enhance Data Visualization Skills:
Improve skills in creating both static and interactive geospatial visualizations, making complex data more accessible and understandable to a diverse audience.
METHODOLOGY
The methodology for this project involves the following steps, corresponding to each of the three main tasks:
- Geospatial Visualization
- Data Collection:
Obtain a real-world dataset suitable for geospatial analysis.
- Data Preparation:
Clean and preprocess the dataset to ensure it is ready for analysis using GeoPandas.
- Visualization:
Use GeoPandas' .plot() and .explore() methods to create both static and interactive maps. Employ additional libraries such as Folium and Mapclassify to enhance the interactivity and classification of the maps.
- Presentation:
Display the static maps for quick insights and interactive maps for detailed exploration, allowing users to hover, zoom, and interact with the geospatial data.
- Geospatial Data Analysis
- Data Collection:
Source the World Total Population dataset from the World Bank and the World Cereal Yield dataset.
- Data Preparation:
Clean and preprocess the datasets to ensure compatibility for analysis.
- Analysis:
Use statistical and geospatial analysis techniques to explore the relationship between the population of different countries and their cereal yield. Employ visualization tools to present the findings in a clear and interpretable manner.
- Interpretation:
Analyze the results to identify significant correlations or trends, providing insights into the potential relationship between population and cereal yield.
- Geospatial Sentiment Analysis
- Data Collection:
Collect tweets relevant to cryptocurrency from Twitter (now X).
- Data Preparation:
Clean and preprocess the tweet dataset, including text normalization and location extraction.
- Sentiment Analysis:
Use the TextBlob library to perform sentiment analysis on the tweets, categorizing them into positive, negative, and neutral sentiments.
- Geospatial Mapping:
Map the sentiments geographically using GeoPandas to visualize the distribution of opinions and trends across different regions.
- Interpretation:
Analyze the mapped sentiments to identify regional trends and insights related to cryptocurrency discussions on Twitter.
My first step will be to install my libraries. Notice the installation of folium and mapclassify they will help with interactive maps.
!pip install geopandas folium matplotlib mapclassify
Collecting geopandas Downloading geopandas-1.0.1-py3-none-any.whl.metadata (2.2 kB) Collecting folium Downloading folium-0.17.0-py2.py3-none-any.whl.metadata (3.8 kB) Requirement already satisfied: matplotlib in c:\users\user\anaconda3\lib\site-packages (3.8.4) Collecting mapclassify Downloading mapclassify-2.8.0-py3-none-any.whl.metadata (2.8 kB) Requirement already satisfied: numpy>=1.22 in c:\users\user\anaconda3\lib\site-packages (from geopandas) (1.26.4) Collecting pyogrio>=0.7.2 (from geopandas) Downloading pyogrio-0.9.0-cp312-cp312-win_amd64.whl.metadata (3.9 kB) Requirement already satisfied: packaging in c:\users\user\anaconda3\lib\site-packages (from geopandas) (23.2) Requirement already satisfied: pandas>=1.4.0 in c:\users\user\anaconda3\lib\site-packages (from geopandas) (2.2.2) Collecting pyproj>=3.3.0 (from geopandas) Downloading pyproj-3.6.1-cp312-cp312-win_amd64.whl.metadata (31 kB) Collecting shapely>=2.0.0 (from geopandas) Downloading shapely-2.0.5-cp312-cp312-win_amd64.whl.metadata (7.2 kB) Collecting branca>=0.6.0 (from folium) Downloading branca-0.7.2-py3-none-any.whl.metadata (1.5 kB) Requirement already satisfied: jinja2>=2.9 in c:\users\user\anaconda3\lib\site-packages (from folium) (3.1.4) Requirement already satisfied: requests in c:\users\user\anaconda3\lib\site-packages (from folium) (2.32.2) Requirement already satisfied: xyzservices in c:\users\user\anaconda3\lib\site-packages (from folium) (2022.9.0) Requirement already satisfied: contourpy>=1.0.1 in c:\users\user\anaconda3\lib\site-packages (from matplotlib) (1.2.0) Requirement already satisfied: cycler>=0.10 in c:\users\user\anaconda3\lib\site-packages (from matplotlib) (0.11.0) Requirement already satisfied: fonttools>=4.22.0 in c:\users\user\anaconda3\lib\site-packages (from matplotlib) (4.51.0) Requirement already satisfied: kiwisolver>=1.3.1 in c:\users\user\anaconda3\lib\site-packages (from matplotlib) (1.4.4) Requirement already satisfied: pillow>=8 in c:\users\user\anaconda3\lib\site-packages (from matplotlib) (10.3.0) Requirement already satisfied: pyparsing>=2.3.1 in c:\users\user\anaconda3\lib\site-packages (from matplotlib) (3.0.9) Requirement already satisfied: python-dateutil>=2.7 in c:\users\user\anaconda3\lib\site-packages (from matplotlib) (2.9.0.post0) Requirement already satisfied: networkx>=2.7 in c:\users\user\anaconda3\lib\site-packages (from mapclassify) (3.2.1) Requirement already satisfied: scikit-learn>=1.0 in c:\users\user\anaconda3\lib\site-packages (from mapclassify) (1.4.2) Requirement already satisfied: scipy>=1.8 in c:\users\user\anaconda3\lib\site-packages (from mapclassify) (1.13.1) Requirement already satisfied: MarkupSafe>=2.0 in c:\users\user\anaconda3\lib\site-packages (from jinja2>=2.9->folium) (2.1.3) Requirement already satisfied: pytz>=2020.1 in c:\users\user\anaconda3\lib\site-packages (from pandas>=1.4.0->geopandas) (2024.1) Requirement already satisfied: tzdata>=2022.7 in c:\users\user\anaconda3\lib\site-packages (from pandas>=1.4.0->geopandas) (2023.3) Requirement already satisfied: certifi in c:\users\user\anaconda3\lib\site-packages (from pyogrio>=0.7.2->geopandas) (2024.7.4) Requirement already satisfied: six>=1.5 in c:\users\user\anaconda3\lib\site-packages (from python-dateutil>=2.7->matplotlib) (1.16.0) Requirement already satisfied: joblib>=1.2.0 in c:\users\user\anaconda3\lib\site-packages (from scikit-learn>=1.0->mapclassify) (1.4.2) Requirement already satisfied: threadpoolctl>=2.0.0 in c:\users\user\anaconda3\lib\site-packages (from scikit-learn>=1.0->mapclassify) (2.2.0) Requirement already satisfied: charset-normalizer<4,>=2 in c:\users\user\anaconda3\lib\site-packages (from requests->folium) (2.0.4) Requirement already satisfied: idna<4,>=2.5 in c:\users\user\anaconda3\lib\site-packages (from requests->folium) (3.7) Requirement already satisfied: urllib3<3,>=1.21.1 in c:\users\user\anaconda3\lib\site-packages (from requests->folium) (2.2.2) Downloading geopandas-1.0.1-py3-none-any.whl (323 kB) ---------------------------------------- 0.0/323.6 kB ? eta -:--:-- ---------------------------------------- 323.6/323.6 kB 9.8 MB/s eta 0:00:00 Downloading folium-0.17.0-py2.py3-none-any.whl (108 kB) ---------------------------------------- 0.0/108.4 kB ? eta -:--:-- ---------------------------------------- 108.4/108.4 kB 6.1 MB/s eta 0:00:00 Downloading mapclassify-2.8.0-py3-none-any.whl (58 kB) ---------------------------------------- 0.0/58.9 kB ? eta -:--:-- ---------------------------------------- 58.9/58.9 kB 3.0 MB/s eta 0:00:00 Downloading branca-0.7.2-py3-none-any.whl (25 kB) Downloading pyogrio-0.9.0-cp312-cp312-win_amd64.whl (15.9 MB) ---------------------------------------- 0.0/15.9 MB ? eta -:--:-- --- ------------------------------------ 1.2/15.9 MB 25.7 MB/s eta 0:00:01 ------ --------------------------------- 2.6/15.9 MB 28.3 MB/s eta 0:00:01 ---------- ----------------------------- 4.3/15.9 MB 30.7 MB/s eta 0:00:01 -------------- ------------------------- 5.6/15.9 MB 29.9 MB/s eta 0:00:01 ----------------- ---------------------- 6.9/15.9 MB 29.5 MB/s eta 0:00:01 -------------------- ------------------- 8.2/15.9 MB 29.0 MB/s eta 0:00:01 ----------------------- ---------------- 9.5/15.9 MB 28.8 MB/s eta 0:00:01 -------------------------- ------------- 10.6/15.9 MB 29.7 MB/s eta 0:00:01 ------------------------------ --------- 12.0/15.9 MB 28.4 MB/s eta 0:00:01 -------------------------------- ------- 13.0/15.9 MB 28.4 MB/s eta 0:00:01 ------------------------------------ --- 14.3/15.9 MB 26.2 MB/s eta 0:00:01 --------------------------------------- 15.9/15.9 MB 27.3 MB/s eta 0:00:01 --------------------------------------- 15.9/15.9 MB 27.3 MB/s eta 0:00:01 --------------------------------------- 15.9/15.9 MB 27.3 MB/s eta 0:00:01 ---------------------------------------- 15.9/15.9 MB 21.1 MB/s eta 0:00:00 Downloading pyproj-3.6.1-cp312-cp312-win_amd64.whl (6.1 MB) ---------------------------------------- 0.0/6.1 MB ? eta -:--:-- --------- ------------------------------ 1.5/6.1 MB 47.2 MB/s eta 0:00:01 ----------------- ---------------------- 2.7/6.1 MB 34.9 MB/s eta 0:00:01 --------------------------- ------------ 4.1/6.1 MB 33.0 MB/s eta 0:00:01 ---------------------------------- ----- 5.3/6.1 MB 30.7 MB/s eta 0:00:01 --------------------------------------- 6.1/6.1 MB 32.4 MB/s eta 0:00:01 ---------------------------------------- 6.1/6.1 MB 24.3 MB/s eta 0:00:00 Downloading shapely-2.0.5-cp312-cp312-win_amd64.whl (1.4 MB) ---------------------------------------- 0.0/1.4 MB ? eta -:--:-- ----------------------------------- ---- 1.3/1.4 MB 40.6 MB/s eta 0:00:01 ---------------------------------------- 1.4/1.4 MB 23.1 MB/s eta 0:00:00 Installing collected packages: shapely, pyproj, pyogrio, branca, mapclassify, geopandas, folium Successfully installed branca-0.7.2 folium-0.17.0 geopandas-1.0.1 mapclassify-2.8.0 pyogrio-0.9.0 pyproj-3.6.1 shapely-2.0.5
I am going to be using interactive map in this work, thus I am importing folium and mapclassify. Pandas will help me setup and manage a dataframe. Geopandas will give me the geometry column this is necessary for mapping activities. Mapplotlib will assist me in visualisation. GeoPandas is a Python library essential for working with vector data. It is built on the pandas library. Crickard, P., (2018)
import geopandas as gpd # importing necassary dependancies
import pandas as pd
import matplotlib.pyplot as plt
import folium
import mapclassify
Task 1.1: Application of Python-based geospatial visualisation tool (e.g., GeoPandas) on a realworld dataset¶
INSTRUCTION
This task requires you to use the dataset, cereal yield. Use a Python-based visualisation tool (such as GeoPandas) to plot a set of choropleth maps representing the world cereal yield (kg per hectare) for the years 2019 and 2020 respectively. The solution should be in a Jupyter notebook (.ipynb), wherein all the functions, libraries and coding steps should be explained in a lucid manner. Major steps for generating the choropleths would typically involve, importing the datasets using appropriate Python libraries, data cleaning, geospatial operations, and plotting. The Jupyter Notebook should be able to reproduce the choropleth maps without any error.
df_cereal_yield = pd.read_csv(r'API_AG.YLD.CREL.KG_DS2_en_csv_v2_5734359.csv', skiprows=4) # Reading my data set with pandas library. I skipped some rows as they had unreadable or no value
df_cereal_yield.tail() #inspect dataset
Country Name | Country Code | Indicator Name | Indicator Code | 1960 | 1961 | 1962 | 1963 | 1964 | 1965 | ... | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 | 2020 | 2021 | 2022 | Unnamed: 67 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
261 | Kosovo | XKX | Cereal yield (kg per hectare) | AG.YLD.CREL.KG | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
262 | Yemen, Rep. | YEM | Cereal yield (kg per hectare) | AG.YLD.CREL.KG | NaN | 782.5 | 780.7 | 771.8 | 776.1 | 773.6 | ... | 962.7 | 784.2 | 687.0 | 699.0 | 682.8 | 864.9 | 861.1 | 791.8 | NaN | NaN |
263 | South Africa | ZAF | Cereal yield (kg per hectare) | AG.YLD.CREL.KG | NaN | 1099.1 | 1142.1 | 1128.0 | 913.9 | 911.4 | ... | 4899.6 | 3348.4 | 3623.1 | 5331.8 | 4652.1 | 4101.4 | 5120.6 | 5124.7 | NaN | NaN |
264 | Zambia | ZMB | Cereal yield (kg per hectare) | AG.YLD.CREL.KG | NaN | 822.2 | 801.4 | 706.9 | 788.9 | 823.5 | ... | 2774.9 | 3026.4 | 2432.2 | 2489.9 | 2168.1 | 2400.4 | 2481.6 | 2525.0 | NaN | NaN |
265 | Zimbabwe | ZWE | Cereal yield (kg per hectare) | AG.YLD.CREL.KG | NaN | 919.7 | 905.9 | 822.5 | 820.5 | 930.8 | ... | 831.4 | 557.5 | 435.1 | 1203.3 | 1254.3 | 748.0 | 1148.6 | 1545.2 | NaN | NaN |
5 rows × 68 columns
#Inspect Dataset
df_cereal_yield.columns
Index(['Country Name', 'Country Code', 'Indicator Name', 'Indicator Code', '1960', '1961', '1962', '1963', '1964', '1965', '1966', '1967', '1968', '1969', '1970', '1971', '1972', '1973', '1974', '1975', '1976', '1977', '1978', '1979', '1980', '1981', '1982', '1983', '1984', '1985', '1986', '1987', '1988', '1989', '1990', '1991', '1992', '1993', '1994', '1995', '1996', '1997', '1998', '1999', '2000', '2001', '2002', '2003', '2004', '2005', '2006', '2007', '2008', '2009', '2010', '2011', '2012', '2013', '2014', '2015', '2016', '2017', '2018', '2019', '2020', '2021', '2022', 'Unnamed: 67'], dtype='object')
df_cereal_yield.isna().sum()
Country Name 0 Country Code 0 Indicator Name 0 Indicator Code 0 1960 266 ... 2019 39 2020 39 2021 39 2022 266 Unnamed: 67 266 Length: 68, dtype: int64
#extract useful columns accoring to the instruction
useful_columns = ['Country Name','Country Code','2019', '2020']
cereal_yield = df_cereal_yield[useful_columns]
cereal_yield
Country Name | Country Code | 2019 | 2020 | |
---|---|---|---|---|
0 | Aruba | ABW | NaN | NaN |
1 | Africa Eastern and Southern | AFE | 1717.894885 | 1838.762607 |
2 | Afghanistan | AFG | 2113.400000 | 1979.900000 |
3 | Africa Western and Central | AFW | 1343.462790 | 1381.643141 |
4 | Angola | AGO | 958.800000 | 992.500000 |
... | ... | ... | ... | ... |
261 | Kosovo | XKX | NaN | NaN |
262 | Yemen, Rep. | YEM | 864.900000 | 861.100000 |
263 | South Africa | ZAF | 4101.400000 | 5120.600000 |
264 | Zambia | ZMB | 2400.400000 | 2481.600000 |
265 | Zimbabwe | ZWE | 748.000000 | 1148.600000 |
266 rows × 4 columns
cereal_yield.isna().sum() #view null values
Country Name 0 Country Code 0 2019 39 2020 39 dtype: int64
I am going to drop columns with null values. I cannot fill the null values with mean or mode, because each country has its perculiarities.
cereal_yield.dropna(subset =['2019', '2020'], inplace= True)
C:\Users\User\AppData\Local\Temp\ipykernel_8832\632718052.py:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy cereal_yield.dropna(subset =['2019', '2020'], inplace= True)
cereal_yield
Country Name | Country Code | 2019 | 2020 | |
---|---|---|---|---|
1 | Africa Eastern and Southern | AFE | 1717.894885 | 1838.762607 |
2 | Afghanistan | AFG | 2113.400000 | 1979.900000 |
3 | Africa Western and Central | AFW | 1343.462790 | 1381.643141 |
4 | Angola | AGO | 958.800000 | 992.500000 |
5 | Albania | ALB | 5038.200000 | 5209.200000 |
... | ... | ... | ... | ... |
259 | World | WLD | 4125.447810 | 4116.427597 |
262 | Yemen, Rep. | YEM | 864.900000 | 861.100000 |
263 | South Africa | ZAF | 4101.400000 | 5120.600000 |
264 | Zambia | ZMB | 2400.400000 | 2481.600000 |
265 | Zimbabwe | ZWE | 748.000000 | 1148.600000 |
227 rows × 4 columns
cereal_yield.isna().sum() #all null values removed
Country Name 0 Country Code 0 2019 0 2020 0 dtype: int64
I will now bring in geopandas dataset (naturaleath_lowres) and merge it with cereal yield
#check available geopandas datasets
gpd.datasets.available
['naturalearth_cities', 'naturalearth_lowres', 'nybb']
#use naturalearth lowres
earth= gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
C:\Users\User\AppData\Local\Temp\ipykernel_8832\3099934314.py:2: FutureWarning: The geopandas.dataset module is deprecated and will be removed in GeoPandas 1.0. You can get the original 'naturalearth_lowres' data from https://www.naturalearthdata.com/downloads/110m-cultural-vectors/. earth= gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
earth
pop_est | continent | name | iso_a3 | gdp_md_est | geometry | |
---|---|---|---|---|---|---|
0 | 889953.0 | Oceania | Fiji | FJI | 5496 | MULTIPOLYGON (((180.00000 -16.06713, 180.00000... |
1 | 58005463.0 | Africa | Tanzania | TZA | 63177 | POLYGON ((33.90371 -0.95000, 34.07262 -1.05982... |
2 | 603253.0 | Africa | W. Sahara | ESH | 907 | POLYGON ((-8.66559 27.65643, -8.66512 27.58948... |
3 | 37589262.0 | North America | Canada | CAN | 1736425 | MULTIPOLYGON (((-122.84000 49.00000, -122.9742... |
4 | 328239523.0 | North America | United States of America | USA | 21433226 | MULTIPOLYGON (((-122.84000 49.00000, -120.0000... |
... | ... | ... | ... | ... | ... | ... |
172 | 6944975.0 | Europe | Serbia | SRB | 51475 | POLYGON ((18.82982 45.90887, 18.82984 45.90888... |
173 | 622137.0 | Europe | Montenegro | MNE | 5542 | POLYGON ((20.07070 42.58863, 19.80161 42.50009... |
174 | 1794248.0 | Europe | Kosovo | -99 | 7926 | POLYGON ((20.59025 41.85541, 20.52295 42.21787... |
175 | 1394973.0 | North America | Trinidad and Tobago | TTO | 24269 | POLYGON ((-61.68000 10.76000, -61.10500 10.890... |
176 | 11062113.0 | Africa | S. Sudan | SSD | 11998 | POLYGON ((30.83385 3.50917, 29.95350 4.17370, ... |
177 rows × 6 columns
#filter out useful columns
earth= earth[["iso_a3","geometry"]]
earth
iso_a3 | geometry | |
---|---|---|
0 | FJI | MULTIPOLYGON (((180.00000 -16.06713, 180.00000... |
1 | TZA | POLYGON ((33.90371 -0.95000, 34.07262 -1.05982... |
2 | ESH | POLYGON ((-8.66559 27.65643, -8.66512 27.58948... |
3 | CAN | MULTIPOLYGON (((-122.84000 49.00000, -122.9742... |
4 | USA | MULTIPOLYGON (((-122.84000 49.00000, -120.0000... |
... | ... | ... |
172 | SRB | POLYGON ((18.82982 45.90887, 18.82984 45.90888... |
173 | MNE | POLYGON ((20.07070 42.58863, 19.80161 42.50009... |
174 | -99 | POLYGON ((20.59025 41.85541, 20.52295 42.21787... |
175 | TTO | POLYGON ((-61.68000 10.76000, -61.10500 10.890... |
176 | SSD | POLYGON ((30.83385 3.50917, 29.95350 4.17370, ... |
177 rows × 2 columns
#rename "iso_a3" to Country Code to create a common column with cereal_yield dataframe
earth= earth.rename(columns={"iso_a3":'Country Code'})
earth #Inspect
Country Code | geometry | |
---|---|---|
0 | FJI | MULTIPOLYGON (((180.00000 -16.06713, 180.00000... |
1 | TZA | POLYGON ((33.90371 -0.95000, 34.07262 -1.05982... |
2 | ESH | POLYGON ((-8.66559 27.65643, -8.66512 27.58948... |
3 | CAN | MULTIPOLYGON (((-122.84000 49.00000, -122.9742... |
4 | USA | MULTIPOLYGON (((-122.84000 49.00000, -120.0000... |
... | ... | ... |
172 | SRB | POLYGON ((18.82982 45.90887, 18.82984 45.90888... |
173 | MNE | POLYGON ((20.07070 42.58863, 19.80161 42.50009... |
174 | -99 | POLYGON ((20.59025 41.85541, 20.52295 42.21787... |
175 | TTO | POLYGON ((-61.68000 10.76000, -61.10500 10.890... |
176 | SSD | POLYGON ((30.83385 3.50917, 29.95350 4.17370, ... |
177 rows × 2 columns
#we shall merge earth data frame and cereal_yield dataframe and call the eventual dataframe earth_yield
earth_yield= earth.merge(cereal_yield, on='Country Code')
earth_yield
Country Code | geometry | Country Name | 2019 | 2020 | |
---|---|---|---|---|---|
0 | FJI | MULTIPOLYGON (((180.00000 -16.06713, 180.00000... | Fiji | 3353.4 | 3665.7 |
1 | TZA | POLYGON ((33.90371 -0.95000, 34.07262 -1.05982... | Tanzania | 1883.3 | 1698.4 |
2 | CAN | MULTIPOLYGON (((-122.84000 49.00000, -122.9742... | Canada | 4010.8 | 4095.4 |
3 | USA | MULTIPOLYGON (((-122.84000 49.00000, -120.0000... | United States | 8006.1 | 8145.3 |
4 | KAZ | POLYGON ((87.35997 49.21498, 86.59878 48.54918... | Kazakhstan | 1154.5 | 1288.6 |
... | ... | ... | ... | ... | ... |
162 | MKD | POLYGON ((22.38053 42.32026, 22.88137 41.99930... | North Macedonia | 3554.0 | 3664.2 |
163 | SRB | POLYGON ((18.82982 45.90887, 18.82984 45.90888... | Serbia | 6126.9 | 6559.4 |
164 | MNE | POLYGON ((20.07070 42.58863, 19.80161 42.50009... | Montenegro | 3166.8 | 3260.0 |
165 | TTO | POLYGON ((-61.68000 10.76000, -61.10500 10.890... | Trinidad and Tobago | 1529.7 | 1520.3 |
166 | SSD | POLYGON ((30.83385 3.50917, 29.95350 4.17370, ... | South Sudan | 880.5 | 885.5 |
167 rows × 5 columns
I will sort my dataframe in descending order to show top five countries with highest cereal yield and bottom five countries with lowest cereal yield
earth_yield.sort_values(by='2019', ascending= False)
Country Code | geometry | Country Name | 2019 | 2020 | |
---|---|---|---|---|---|
79 | ARE | POLYGON ((51.57952 24.24550, 51.75744 24.29407... | United Arab Emirates | 23842.3 | 25980.3 |
81 | KWT | POLYGON ((47.97452 29.97582, 48.18319 29.53448... | Kuwait | 17625.4 | 13393.1 |
83 | OMN | MULTIPOLYGON (((55.20834 22.70833, 55.23449 23... | Oman | 13291.0 | 18835.1 |
124 | BEL | POLYGON ((6.15666 50.80372, 6.04307 50.12805, ... | Belgium | 8988.5 | 8430.6 |
125 | NLD | POLYGON ((6.90514 53.48216, 7.09205 53.14404, ... | Netherlands | 8654.2 | 7919.9 |
... | ... | ... | ... | ... | ... |
84 | VUT | MULTIPOLYGON (((167.21680 -15.89185, 167.84488... | Vanuatu | 618.2 | 609.7 |
13 | SDN | POLYGON ((24.56737 8.22919, 23.80581 8.66632, ... | Sudan | 552.9 | 489.6 |
11 | SOM | POLYGON ((41.58513 -1.68325, 40.99300 -0.85829... | Somalia | 522.2 | 502.5 |
51 | NER | POLYGON ((14.85130 22.86295, 15.09689 21.30852... | Niger | 501.5 | 560.2 |
75 | GMB | POLYGON ((-16.71373 13.59496, -15.62460 13.623... | Gambia, The | 443.8 | 501.9 |
167 rows × 5 columns
earth_yield.sort_values(by='2020', ascending= False)
Country Code | geometry | Country Name | 2019 | 2020 | |
---|---|---|---|---|---|
79 | ARE | POLYGON ((51.57952 24.24550, 51.75744 24.29407... | United Arab Emirates | 23842.3 | 25980.3 |
83 | OMN | MULTIPOLYGON (((55.20834 22.70833, 55.23449 23... | Oman | 13291.0 | 18835.1 |
81 | KWT | POLYGON ((47.97452 29.97582, 48.18319 29.53448... | Kuwait | 17625.4 | 13393.1 |
131 | NZL | MULTIPOLYGON (((176.88582 -40.06598, 176.50802... | New Zealand | 8205.0 | 9039.3 |
129 | NCL | POLYGON ((165.77999 -21.08000, 166.59999 -21.7... | New Caledonia | 7031.1 | 8689.9 |
... | ... | ... | ... | ... | ... |
11 | SOM | POLYGON ((41.58513 -1.68325, 40.99300 -0.85829... | Somalia | 522.2 | 502.5 |
75 | GMB | POLYGON ((-16.71373 13.59496, -15.62460 13.623... | Gambia, The | 443.8 | 501.9 |
13 | SDN | POLYGON ((24.56737 8.22919, 23.80581 8.66632, ... | Sudan | 552.9 | 489.6 |
22 | LSO | POLYGON ((28.97826 -28.95560, 29.32517 -29.257... | Lesotho | 695.5 | 433.3 |
46 | NAM | POLYGON ((19.89577 -24.76779, 19.89473 -28.461... | Namibia | 731.5 | 429.5 |
167 rows × 5 columns
Fig 1: World cereal yield (kg per hectare) for the year 2019 with .plot
fig, ax = plt.subplots(figsize=(20,16))
earth.plot(ax=ax, color= "black")
earth_yield.plot(ax=ax, column="2019", legend=True, legend_kwds={"label": "World cereal yield (kg per hectare) for the year 2019","orientation":"horizontal"}, cmap='Set1')
plt.show()
fig, ax = plt.subplots(figsize=(20,16))
earth.plot(ax=ax, color= "black")
earth_yield.plot(ax=ax, column="2020", legend=True, legend_kwds={"label": "World cereal yield (kg per hectare) for the year 2020","orientation":"horizontal"}, cmap='Set1')
plt.show()
I am now going to use .explore for my visualisation. The motive behind using this interactive map in explaned on methodoloy and the libraries to install to use .explore are also highlighted, with reference(s) provided.
earth_yield.explore(column='2019', # make choropleth based on "2019" column
tooltip=["Country Name",'2019'], # show "country name and 2019 value" in tooltip (on hover)
popup=True, # show all values in popup (on click)
tiles="openstreetmap", # I shall use "openstreetmap" tiles
cmap="Set1", # I shall use "Set1" matplotlib colormap
legend=True,
style_kwds=dict(color="black"), # use black outline
zoom_control=True,
)
earth_yield.explore(column='2019', # make choropleth based on "2019" column
tooltip=["Country Name",'2019'], # show "country name and 2019 value" in tooltip (on hover)
popup=True, # show all values in popup (on click)
tiles="", # I shall use "CartoDB positron" tiles
cmap="Set1", # I shall use "Set1" matplotlib colormap
legend=True,
style_kwds=dict(color="black"), # use black outline
zoom_control=True,
)
Brief Explanation:¶
From the tables and maps, it is obvious that:
- The top three countries with the highest cereal yield are in Asia for 2019 and 2020.
- The United Arab Emirates tops the list for both 2019 and 2020 with 23842.3 and 25980.3kg per hectare, respectively.
- It will be observed that the UAE grew its yield between 2019 and 2020.
- The 4th and 5th countries are in Europe: Belgium at 8988.5kg per hectare and the Netherlands at 8654.2kg per hectare, respectively, for 2019.
- Belgium and the Netherlands dropped in cereal yield between 2019 and 2020 and lost their position to New Zealand and New Caledonia.
- Policymakers may want to investigate why they dropped and devise measures to make cereal yields sustainable in Belgium and the Netherlands.
- African countries are sitting at the bottom for both 2019 and 2020.**
Task 1.2: Analysis of geospatial datasets¶
In this task, you are required to use one more dataset, the world's total population (source: World Bank) in addition to the cereal yield dataset used in the previous task. Both datasets are available on Moodle under the Assessment folder. All the choropleths and plots must be generated using appropriate Python-based tool
df_total_pop = pd.read_csv(r"API_SP.POP.TOTL_DS2_en_csv_v2_4485025.csv", skiprows=4) # Read population dataframe
df_total_pop[:3] #Inspect dataframe
Country Name | Country Code | Indicator Name | Indicator Code | 1960 | 1961 | 1962 | 1963 | 1964 | 1965 | ... | 2013 | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 | 2020 | 2021 | Unnamed: 66 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Aruba | ABW | Population, total | SP.POP.TOTL | 54208.0 | 55434.0 | 56234.0 | 56699.0 | 57029.0 | 57357.0 | ... | 103165.0 | 103776.0 | 104339.0 | 104865.0 | 105361.0 | 105846.0 | 106310.0 | 106766.0 | 107195.0 | NaN |
1 | Africa Eastern and Southern | AFE | Population, total | SP.POP.TOTL | 130836765.0 | 134159786.0 | 137614644.0 | 141202036.0 | 144920186.0 | 148769974.0 | ... | 562601578.0 | 578075373.0 | 593871847.0 | 609978946.0 | 626392880.0 | 643090131.0 | 660046272.0 | 677243299.0 | 694665117.0 | NaN |
2 | Afghanistan | AFG | Population, total | SP.POP.TOTL | 8996967.0 | 9169406.0 | 9351442.0 | 9543200.0 | 9744772.0 | 9956318.0 | ... | 32269592.0 | 33370804.0 | 34413603.0 | 35383028.0 | 36296111.0 | 37171922.0 | 38041757.0 | 38928341.0 | 39835428.0 | NaN |
3 rows × 67 columns
df_total_pop.columns
Index(['Country Name', 'Country Code', 'Indicator Name', 'Indicator Code', '1960', '1961', '1962', '1963', '1964', '1965', '1966', '1967', '1968', '1969', '1970', '1971', '1972', '1973', '1974', '1975', '1976', '1977', '1978', '1979', '1980', '1981', '1982', '1983', '1984', '1985', '1986', '1987', '1988', '1989', '1990', '1991', '1992', '1993', '1994', '1995', '1996', '1997', '1998', '1999', '2000', '2001', '2002', '2003', '2004', '2005', '2006', '2007', '2008', '2009', '2010', '2011', '2012', '2013', '2014', '2015', '2016', '2017', '2018', '2019', '2020', '2021', 'Unnamed: 66'], dtype='object')
My next step will be to filter out all columns that will be useful to answer all questions for this task
necessary_columns = ['Country Name','Country Code', '2010', '2011', '2012', '2013',
'2014', '2015', '2016', '2017', '2018', '2019', '2020', '2021']
total_pop=df_total_pop[necessary_columns]
total_pop.isna().sum()#I shall ignore the null values in this case as they have no direct impact on my work
Country Name 0 Country Code 0 2010 1 2011 1 2012 2 2013 2 2014 2 2015 2 2016 2 2017 2 2018 2 2019 2 2020 2 2021 2 dtype: int64
total_pop[:2] #inspect dataframe
Country Name | Country Code | 2010 | 2011 | 2012 | 2013 | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 | 2020 | 2021 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Aruba | ABW | 101665.0 | 102050.0 | 102565.0 | 103165.0 | 103776.0 | 104339.0 | 104865.0 | 105361.0 | 105846.0 | 106310.0 | 106766.0 | 107195.0 |
1 | Africa Eastern and Southern | AFE | 518468229.0 | 532760424.0 | 547482863.0 | 562601578.0 | 578075373.0 | 593871847.0 | 609978946.0 | 626392880.0 | 643090131.0 | 660046272.0 | 677243299.0 | 694665117.0 |
total_pop.columns #confirming I have picked all useful columns
Index(['Country Name', 'Country Code', '2010', '2011', '2012', '2013', '2014', '2015', '2016', '2017', '2018', '2019', '2020', '2021'], dtype='object')
cereal_yield2= df_cereal_yield[necessary_columns] # I have recall my df_cereal_yield dataframe from Task 1, I will extract the necessary columns and strore in a fresh data frame
cereal_yield2[:2]
Country Name | Country Code | 2010 | 2011 | 2012 | 2013 | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 | 2020 | 2021 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Aruba | ABW | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
1 | Africa Eastern and Southern | AFE | 1553.204407 | 1492.710513 | 1650.256751 | 1533.952962 | 1636.00973 | 1616.362162 | 1490.807738 | 1764.116707 | 1728.295922 | 1717.894885 | 1838.762607 | 1840.899744 |
My next step will be to merge the two dataframe together, I have given them suffxies to avoid confusion
pop_yield = cereal_yield2.merge(total_pop, on='Country Code', suffixes=(['_yield','_pop']))
pop_yield[:3]
Country Name_yield | Country Code | 2010_yield | 2011_yield | 2012_yield | 2013_yield | 2014_yield | 2015_yield | 2016_yield | 2017_yield | ... | 2012_pop | 2013_pop | 2014_pop | 2015_pop | 2016_pop | 2017_pop | 2018_pop | 2019_pop | 2020_pop | 2021_pop | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Aruba | ABW | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | 102565.0 | 103165.0 | 103776.0 | 104339.0 | 104865.0 | 105361.0 | 105846.0 | 106310.0 | 106766.0 | 107195.0 |
1 | Africa Eastern and Southern | AFE | 1553.204407 | 1492.710513 | 1650.256751 | 1533.952962 | 1636.00973 | 1616.362162 | 1490.807738 | 1764.116707 | ... | 547482863.0 | 562601578.0 | 578075373.0 | 593871847.0 | 609978946.0 | 626392880.0 | 643090131.0 | 660046272.0 | 677243299.0 | 694665117.0 |
2 | Afghanistan | AFG | 2011.100000 | 1659.900000 | 2029.600000 | 2048.500000 | 2017.50000 | 2132.200000 | 1980.400000 | 2022.500000 | ... | 31161378.0 | 32269592.0 | 33370804.0 | 34413603.0 | 35383028.0 | 36296111.0 | 37171922.0 | 38041757.0 | 38928341.0 | 39835428.0 |
3 rows × 27 columns
Next step is to recall my earth dataframe and merge with pop_yield dataframe (containing population and cereal yield). Merger will be on thier common country code column
earth_yield2= earth.merge(pop_yield, on='Country Code',
)
earth_yield2
Country Code | geometry | Country Name_yield | 2010_yield | 2011_yield | 2012_yield | 2013_yield | 2014_yield | 2015_yield | 2016_yield | ... | 2012_pop | 2013_pop | 2014_pop | 2015_pop | 2016_pop | 2017_pop | 2018_pop | 2019_pop | 2020_pop | 2021_pop | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | FJI | MULTIPOLYGON (((180.00000 -16.06713, 180.00000... | Fiji | 2871.1 | 2821.3 | 2423.4 | 2999.5 | 4475.4 | 3001.0 | 3000.9 | ... | 865065.0 | 865602.0 | 866447.0 | 868632.0 | 872406.0 | 877460.0 | 883490.0 | 889955.0 | 896444.0 | 902899.0 |
1 | TZA | POLYGON ((33.90371 -0.95000, 34.07262 -1.05982... | Tanzania | 1647.9 | 1390.4 | 1314.8 | 1418.0 | 1524.8 | 1449.4 | 1570.4 | ... | 47053033.0 | 48483132.0 | 49960563.0 | 51482638.0 | 53049231.0 | 54660345.0 | 56313444.0 | 58005461.0 | 59734213.0 | 61498438.0 |
2 | CAN | MULTIPOLYGON (((-122.84000 49.00000, -122.9742... | Canada | 3501.1 | 3552.4 | 3456.3 | 4160.4 | 3647.0 | 3673.0 | 4239.1 | ... | 34714222.0 | 35082954.0 | 35437435.0 | 35702908.0 | 36109487.0 | 36545236.0 | 37065084.0 | 37601230.0 | 38037204.0 | 38246108.0 |
3 | USA | MULTIPOLYGON (((-122.84000 49.00000, -120.0000... | United States | 6978.1 | 6803.5 | 5911.9 | 7300.9 | 7638.1 | 7430.1 | 8614.2 | ... | 313877662.0 | 316059947.0 | 318386329.0 | 320738994.0 | 323071755.0 | 325122128.0 | 326838199.0 | 328329953.0 | 331501080.0 | 331893745.0 |
4 | KAZ | POLYGON ((87.35997 49.21498, 86.59878 48.54918... | Kazakhstan | 804.1 | 1688.6 | 865.0 | 1164.9 | 1172.7 | 1278.1 | 1347.7 | ... | 16792090.0 | 17035551.0 | 17288285.0 | 17542806.0 | 17794055.0 | 18037776.0 | 18276452.0 | 18513673.0 | 18755666.0 | 19002586.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
164 | MKD | POLYGON ((22.38053 42.32026, 22.88137 41.99930... | North Macedonia | 3329.6 | 3502.1 | 2839.3 | 3381.3 | 3900.0 | 3051.1 | 3859.2 | ... | 2061044.0 | 2064032.0 | 2067471.0 | 2070226.0 | 2072490.0 | 2074502.0 | 2076217.0 | 2076694.0 | 2072531.0 | 2065092.0 |
165 | SRB | POLYGON ((18.82982 45.90887, 18.82984 45.90888... | Serbia | 4958.8 | 4751.4 | 3701.7 | 5157.8 | 5960.6 | 4787.3 | 6165.9 | ... | 7199077.0 | 7164132.0 | 7130576.0 | 7095383.0 | 7058322.0 | 7020858.0 | 6982604.0 | 6945235.0 | 6899126.0 | 6844078.0 |
166 | MNE | POLYGON ((20.07070 42.58863, 19.80161 42.50009... | Montenegro | 3321.4 | 3305.7 | 2638.8 | 3770.5 | 3451.5 | 3146.5 | 3261.7 | ... | 620601.0 | 621207.0 | 621810.0 | 622159.0 | 622303.0 | 622373.0 | 622227.0 | 622028.0 | 621306.0 | 620173.0 |
167 | TTO | POLYGON ((-61.68000 10.76000, -61.10500 10.890... | Trinidad and Tobago | 1667.4 | 1639.5 | 1471.7 | 1610.5 | 1329.4 | 1110.4 | 1444.3 | ... | 1344814.0 | 1353708.0 | 1362337.0 | 1370332.0 | 1377563.0 | 1384060.0 | 1389841.0 | 1394969.0 | 1399491.0 | 1403374.0 |
168 | SSD | POLYGON ((30.83385 3.50917, 29.95350 4.17370, ... | South Sudan | NaN | NaN | 705.2 | 765.6 | 1253.7 | 907.3 | 879.1 | ... | 10113648.0 | 10355030.0 | 10554882.0 | 10715657.0 | 10832520.0 | 10910774.0 | 10975924.0 | 11062114.0 | 11193729.0 | 11381377.0 |
169 rows × 28 columns
Task 1.2.1¶
For the year 2021, generate choropleth maps of cereal yield for only the countries having a population less than or equal to 67326569. Very briefly interpret the generated map
I have my dataframe well figured out, I shall now begin to respond to the questions. I shall filter according to question
earth_yield2.columns
Index(['Country Code', 'geometry', 'Country Name_yield', '2010_yield', '2011_yield', '2012_yield', '2013_yield', '2014_yield', '2015_yield', '2016_yield', '2017_yield', '2018_yield', '2019_yield', '2020_yield', '2021_yield', 'Country Name_pop', '2010_pop', '2011_pop', '2012_pop', '2013_pop', '2014_pop', '2015_pop', '2016_pop', '2017_pop', '2018_pop', '2019_pop', '2020_pop', '2021_pop'], dtype='object')
# I will drop all null values at this point
earth_yield2.dropna(inplace= True)
#filter for 2021
earth_yield_2021 = earth_yield2[['Country Code', 'geometry', 'Country Name_yield', '2021_yield', '2021_pop']]
earth_yield_2021 # This dataframe has all countries their population and yield for 2021. I shall filter according to task from here
Country Code | geometry | Country Name_yield | 2021_yield | 2021_pop | |
---|---|---|---|---|---|
0 | FJI | MULTIPOLYGON (((180.00000 -16.06713, 180.00000... | Fiji | 3887.3 | 902899.0 |
1 | TZA | POLYGON ((33.90371 -0.95000, 34.07262 -1.05982... | Tanzania | 1651.1 | 61498438.0 |
2 | CAN | MULTIPOLYGON (((-122.84000 49.00000, -122.9742... | Canada | 3078.3 | 38246108.0 |
3 | USA | MULTIPOLYGON (((-122.84000 49.00000, -120.0000... | United States | 8268.0 | 331893745.0 |
4 | KAZ | POLYGON ((87.35997 49.21498, 86.59878 48.54918... | Kazakhstan | 1049.0 | 19002586.0 |
... | ... | ... | ... | ... | ... |
163 | BIH | POLYGON ((18.56000 42.65000, 17.67492 43.02856... | Bosnia and Herzegovina | 4422.1 | 3263459.0 |
164 | MKD | POLYGON ((22.38053 42.32026, 22.88137 41.99930... | North Macedonia | 3537.8 | 2065092.0 |
165 | SRB | POLYGON ((18.82982 45.90887, 18.82984 45.90888... | Serbia | 5768.6 | 6844078.0 |
166 | MNE | POLYGON ((20.07070 42.58863, 19.80161 42.50009... | Montenegro | 3330.7 | 620173.0 |
167 | TTO | POLYGON ((-61.68000 10.76000, -61.10500 10.890... | Trinidad and Tobago | 1588.1 | 1403374.0 |
164 rows × 5 columns
#filter for population less than or equal to 67326569
pop_67326569 =earth_yield_2021[earth_yield_2021['2021_pop']<=67326569]
I am going to sort by 2021 yield and 2021 population, this is to show top and bottom countries and to assist me in explaining.
pop_67326569.sort_values(by='2021_yield', ascending= False) #filter by 2021 cereal yield
Country Code | geometry | Country Name_yield | 2021_yield | 2021_pop | |
---|---|---|---|---|---|
81 | ARE | POLYGON ((51.57952 24.24550, 51.75744 24.29407... | United Arab Emirates | 26226.2 | 9991083.0 |
85 | OMN | MULTIPOLYGON (((55.20834 22.70833, 55.23449 23... | Oman | 16461.4 | 5223376.0 |
83 | KWT | POLYGON ((47.97452 29.97582, 48.18319 29.53448... | Kuwait | 11216.7 | 4328553.0 |
133 | NZL | MULTIPOLYGON (((176.88582 -40.06598, 176.50802... | New Zealand | 8728.4 | 5122600.0 |
130 | IRL | POLYGON ((-6.19788 53.86757, -6.03299 53.15316... | Ireland | 8606.5 | 5028230.0 |
... | ... | ... | ... | ... | ... |
13 | SDN | POLYGON ((24.56737 8.22919, 23.80581 8.66632, ... | Sudan | 566.8 | 44909351.0 |
47 | NAM | POLYGON ((19.89577 -24.76779, 19.89473 -28.461... | Namibia | 517.3 | 2587344.0 |
11 | SOM | POLYGON ((41.58513 -1.68325, 40.99300 -0.85829... | Somalia | 502.6 | 16359500.0 |
77 | GMB | POLYGON ((-16.71373 13.59496, -15.62460 13.623... | Gambia, The | 490.7 | 2486937.0 |
52 | NER | POLYGON ((14.85130 22.86295, 15.09689 21.30852... | Niger | 349.6 | 25130810.0 |
143 rows × 5 columns
pop_67326569.sort_values(by='2021_pop', ascending= False) #sort descending by 2021 population
Country Code | geometry | Country Name_yield | 2021_yield | 2021_pop | |
---|---|---|---|---|---|
139 | GBR | MULTIPOLYGON (((-6.19788 53.86757, -6.95373 54... | United Kingdom | 6966.8 | 67326569.0 |
1 | TZA | POLYGON ((33.90371 -0.95000, 34.07262 -1.05982... | Tanzania | 1651.1 | 61498438.0 |
22 | ZAF | POLYGON ((16.34498 -28.57671, 16.82402 -28.082... | South Africa | 5124.7 | 60041996.0 |
137 | ITA | MULTIPOLYGON (((10.44270 46.89355, 11.04856 46... | Italy | 5562.8 | 59066225.0 |
12 | KEN | POLYGON ((39.20222 -4.67677, 37.76690 -3.67712... | Kenya | 1487.6 | 54985702.0 |
... | ... | ... | ... | ... | ... |
145 | BRN | POLYGON ((115.45071 5.44773, 115.40570 4.95523... | Brunei Darussalam | 2885.2 | 441532.0 |
36 | BLZ | POLYGON ((-89.14308 17.80832, -89.15091 17.955... | Belize | 4289.5 | 404915.0 |
18 | BHS | MULTIPOLYGON (((-78.98000 26.79000, -78.51000 ... | Bahamas, The | 8419.2 | 396914.0 |
86 | VUT | MULTIPOLYGON (((167.21680 -15.89185, 167.84488... | Vanuatu | 609.5 | 314464.0 |
131 | NCL | POLYGON ((165.77999 -21.08000, 166.59999 -21.7... | New Caledonia | 7810.9 | 272620.0 |
143 rows × 5 columns
fig, ax = plt.subplots(figsize=(20,16))
earth.plot(ax=ax, color= "black")
pop_67326569.plot(ax=ax, column="2021_yield", legend=True, legend_kwds={"label": "cereal yield for only the countries having a population less than or equal to 67326569","orientation":"horizontal"}, cmap='Set1')
plt.show()
pop_67326569.explore(column='2021_yield', # make choropleth based on "" column
tooltip=["Country Name_yield",'2021_yield','2021_pop'], # show "Country Name_yield",'2021_yield','2021_pop' values in tooltip (on hover)
popup=True, # show all values in popup (on click)
tiles="openstreetmap", # I shall use "openstreetmap" tiles
cmap="Set1", # use "Set1" matplotlib colormap
legend=True,
style_kwds=dict(color="black"), # use black outline
zoom_control=True,
legend_kwds={"label": "cereal yield for only the countries having a population less than or equal to 67326569","orientation":"horizontal"}
)
Brief Explanation¶
- The United Arab Emirates at 26226.2kg per hectare, Oman at 16461.4kg per hectare, and Kuwait, all Asian countries, come first, second, and third for countries with the highest cereal yield for the year 2021.
- The top 3 countries are in Asia.
- The Australian continent is represented among this cohort by New Zealand at 8728.4kg per hectare.
- Europe is represented in this cohort by the Republic of Ireland, at 8606.5kg per hectare.
- Africa is at the bottom.
Policy implications and lessons learned.¶
The UAE, Oman, and Kuwait are striving to achieve self-sufficiency in food production on a sustainable scale. Great Britain, despite having the highest population, is not among the top 5. Great Britain should begin to study the models of the top countries in cereal production and begin to implement them immediately.
Given the rate of political unrest all over the world with allies being formed and the uncertainty of the global market, reliance on the importation of cereal and holding to the economic principles of specialization and international trade-offs may spell doom when push comes to shove.
Self-sufficiency in food production for Great Britain and all countries is a clarion call.
Task 1.2.2¶
For the year 2021, generate choropleth maps of cereal yield for only the countries having a population greater than or equal to 331,893,745. Very briefly interpret the generated map.
#filter for population greater than or equal to greater than or equal to 331893745
pop_331893745 = earth_yield_2021[earth_yield_2021['2021_pop']>=331893745]
pop_331893745
Country Code | geometry | Country Name_yield | 2021_yield | 2021_pop | |
---|---|---|---|---|---|
3 | USA | MULTIPOLYGON (((-122.84000 49.00000, -120.0000... | United States | 8268.0 | 3.318937e+08 |
95 | IND | POLYGON ((97.32711 28.26158, 97.40256 27.88254... | India | 3478.8 | 1.393409e+09 |
136 | CHN | MULTIPOLYGON (((109.47521 18.19770, 108.65521 ... | China | 6320.8 | 1.412360e+09 |
fig, ax = plt.subplots(figsize=(20,16))
earth.plot(ax=ax, color= "black")
pop_331893745.plot(ax=ax, column="2021_yield", legend=True, legend_kwds={"label": "Creal yield for the countries having a population greater than or equal to 331,893,745","orientation":"horizontal"}, cmap='Set1')
plt.show()
pop_331893745.explore(column='2021_yield',
tooltip=["Country Name_yield",'2021_yield','2021_pop'], # show Country Name_yield",'2021_yield','2021_pop values in tooltip (on hover)
popup=True, # show all values in popup (on click)
tiles="openstreetmap", # use "openstreetmap" tiles
cmap="Set1", # use "Set1" matplotlib colormap
legend=True,
style_kwds=dict(color="black"), # use black outline
zoom_control=True,
#zoom_start = 1,
#scrollWheelZoom=False
)
Brief Explanation¶
The map shows that:
The United States, in gray, has a population of about 331 million people and a cereal yield of 8268. This cereal is just fair given its population.
China, in yellow, has a cereal yield of 6320.8kg per hectare and a population of 1.412 billion people as of 2021.
I have a bit of concern for India. I do not think that a cereal yield of 3478.8kg is right for a population of 1.393 billion. All hands have to be on deck to increase cereal yield to avoid Robert Thomas Malthus's 'doom' theory of a geometric population growth rate and arithmetic growth in food supplies (Intelligent Economist, 2020).
I am aware that there could be controversy over population control measures. But we can never go wrong with increasing cereal yield.
Task 1.2.3¶
For the year 2021, generate choropleth maps of cereal yield for only the countries having a population between 10269022 and 1393409034. Very briefly interpret the generated map
pop_10269022 = earth_yield_2021[(earth_yield_2021["2021_pop"]>= 10269022) & (earth_yield_2021["2021_pop"] <=1393409034)]
pop_10269022.sort_values(by='2021_yield', ascending= False) #sort descending by 2021 cereal yield
Country Code | geometry | Country Name_yield | 2021_yield | 2021_pop | |
---|---|---|---|---|---|
3 | USA | MULTIPOLYGON (((-122.84000 49.00000, -120.0000... | United States | 8268.0 | 331893745.0 |
126 | BEL | POLYGON ((6.15666 50.80372, 6.04307 50.12805, ... | Belgium | 7906.2 | 11587882.0 |
127 | NLD | POLYGON ((6.90514 53.48216, 7.09205 53.14404, ... | Netherlands | 7872.3 | 17533405.0 |
40 | FRA | MULTIPOLYGON (((-51.65780 4.15623, -52.24934 3... | France | 7170.9 | 67499343.0 |
157 | EGY | POLYGON ((36.86623 22.00000, 32.90000 22.00000... | Egypt, Arab Rep. | 7132.5 | 104258327.0 |
... | ... | ... | ... | ... | ... |
14 | TCD | POLYGON ((23.83766 19.58047, 23.88689 15.61084... | Chad | 814.3 | 16914985.0 |
153 | YEM | POLYGON ((52.00001 19.00000, 52.78218 17.34974... | Yemen, Rep. | 791.8 | 30490639.0 |
13 | SDN | POLYGON ((24.56737 8.22919, 23.80581 8.66632, ... | Sudan | 566.8 | 44909351.0 |
11 | SOM | POLYGON ((41.58513 -1.68325, 40.99300 -0.85829... | Somalia | 502.6 | 16359500.0 |
52 | NER | POLYGON ((14.85130 22.86295, 15.09689 21.30852... | Niger | 349.6 | 25130810.0 |
87 rows × 5 columns
I will also sort by 2021 population to aid my discussion
pop_10269022.sort_values(by='2021_pop', ascending= False) #sort descending by 2021 population
Country Code | geometry | Country Name_yield | 2021_yield | 2021_pop | |
---|---|---|---|---|---|
95 | IND | POLYGON ((97.32711 28.26158, 97.40256 27.88254... | India | 3478.8 | 1.393409e+09 |
3 | USA | MULTIPOLYGON (((-122.84000 49.00000, -120.0000... | United States | 8268.0 | 3.318937e+08 |
7 | IDN | MULTIPOLYGON (((141.00021 -2.60015, 141.01706 ... | Indonesia | 5351.3 | 2.763618e+08 |
99 | PAK | POLYGON ((77.83745 35.49401, 76.87172 34.65354... | Pakistan | 3564.9 | 2.251999e+08 |
26 | BRA | POLYGON ((-53.37366 -33.76838, -53.65054 -33.2... | Brazil | 4478.7 | 2.139934e+08 |
... | ... | ... | ... | ... | ... |
149 | CZE | POLYGON ((15.01700 51.10667, 15.49097 50.78473... | Czechia | 6113.0 | 1.070345e+07 |
120 | GRC | MULTIPOLYGON (((26.29000 35.29999, 26.16500 35... | Greece | 4272.7 | 1.066457e+07 |
107 | SWE | POLYGON ((11.02737 58.85615, 11.46827 59.43239... | Sweden | 5064.7 | 1.041581e+07 |
128 | PRT | POLYGON ((-9.03482 41.88057, -8.67195 42.13469... | Portugal | 5379.7 | 1.029942e+07 |
80 | JOR | POLYGON ((35.54567 32.39399, 35.71992 32.70919... | Jordan | 2290.1 | 1.026902e+07 |
87 rows × 5 columns
fig, ax = plt.subplots(figsize=(20,16))
earth.plot(ax=ax, color= "black")
pop_10269022.plot(ax=ax, column='2021_yield', legend=True, legend_kwds={"label": 'cereal yield for only the countries having a population between 10269022 and 1393409034', "orientation":"horizontal"}, cmap='Set1')
plt.show()
pop_10269022.explore(column='2021_yield', # make choropleth based on "2021_yield" column
tooltip=["Country Name_yield",'2021_yield','2021_pop'], # show "Country Name_yield",'2021_yield','2021_pop' in tooltip (on hover)
popup=True, # show all values in popup (on click)
tiles="openstreetmap", # use "openstreetmap" tiles
cmap="Set1", # use "Set1" matplotlib colormap
legend=True,
style_kwds=dict(color="black"), # use black outline
zoom_control=True,
#zoom_start = 1,
#scrollWheelZoom=False
)
Brief Explanation:¶
From data frames and maps, I can observe that:
- The United States of America (Gray) comes in first in this category with a cereal yield of 8268kg per hectare and a population of about 331 million.
- The European countries of Belgium at 7906 cereal yield, the Netherlands at 7170kg per hectare, and France at 7170.9kg per hectare occupy the 2nd, 3rd, and 4th top positions, respectively.
- Africa is represented in the 5th top position by Egypt, at 7132kg per hectare.
- Africa is well represented at the bottom of the list of countries with the lowest cereal yield. The countries at the bottom are experiencing or have recently experienced civil wars and internal crises.
- I noticed that India has the highest population in this category, but its crop yield is not close to the second country with the lowest population (Portugal). This is a policy area of concern for India to avoid the 'doom' of the Malthus theory of population growth.
Task 1.2.4¶
Plot (scatter or line plot) the percentage change in cereal yield from 2011 to 2021, for the country having the highest population in 2021. In this question, you must consider the cereal yield for each year between 2011 and 2021. Very briefly interpret the generated plot
earth_yield2.columns # inspect the parent dataframe for this question
Index(['Country Code', 'geometry', 'Country Name_yield', '2010_yield', '2011_yield', '2012_yield', '2013_yield', '2014_yield', '2015_yield', '2016_yield', '2017_yield', '2018_yield', '2019_yield', '2020_yield', '2021_yield', 'Country Name_pop', '2010_pop', '2011_pop', '2012_pop', '2013_pop', '2014_pop', '2015_pop', '2016_pop', '2017_pop', '2018_pop', '2019_pop', '2020_pop', '2021_pop'], dtype='object')
#filter for useful columns and cereal yeild from 2011 to 2021
earth_yield_2010_2021 = earth_yield2[['Country Code', 'geometry', 'Country Name_yield', '2021_pop', '2010_yield', '2011_yield',
'2012_yield', '2013_yield', '2014_yield', '2015_yield', '2016_yield',
'2017_yield', '2018_yield', '2019_yield', '2020_yield', '2021_yield']]
earth_yield_2010_2021
Country Code | geometry | Country Name_yield | 2021_pop | 2010_yield | 2011_yield | 2012_yield | 2013_yield | 2014_yield | 2015_yield | 2016_yield | 2017_yield | 2018_yield | 2019_yield | 2020_yield | 2021_yield | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | FJI | MULTIPOLYGON (((180.00000 -16.06713, 180.00000... | Fiji | 902899.0 | 2871.1 | 2821.3 | 2423.4 | 2999.5 | 4475.4 | 3001.0 | 3000.9 | 3000.9 | 3002.2 | 3353.4 | 3665.7 | 3887.3 |
1 | TZA | POLYGON ((33.90371 -0.95000, 34.07262 -1.05982... | Tanzania | 61498438.0 | 1647.9 | 1390.4 | 1314.8 | 1418.0 | 1524.8 | 1449.4 | 1570.4 | 1692.0 | 1938.5 | 1883.3 | 1698.4 | 1651.1 |
2 | CAN | MULTIPOLYGON (((-122.84000 49.00000, -122.9742... | Canada | 38246108.0 | 3501.1 | 3552.4 | 3456.3 | 4160.4 | 3647.0 | 3673.0 | 4239.1 | 4104.8 | 3914.7 | 4010.8 | 4095.4 | 3078.3 |
3 | USA | MULTIPOLYGON (((-122.84000 49.00000, -120.0000... | United States | 331893745.0 | 6978.1 | 6803.5 | 5911.9 | 7300.9 | 7638.1 | 7430.1 | 8614.2 | 8281.3 | 8196.4 | 8006.1 | 8145.3 | 8268.0 |
4 | KAZ | POLYGON ((87.35997 49.21498, 86.59878 48.54918... | Kazakhstan | 19002586.0 | 804.1 | 1688.6 | 865.0 | 1164.9 | 1172.7 | 1278.1 | 1347.7 | 1355.0 | 1359.8 | 1154.5 | 1288.6 | 1049.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
163 | BIH | POLYGON ((18.56000 42.65000, 17.67492 43.02856... | Bosnia and Herzegovina | 3263459.0 | 3853.0 | 3724.9 | 3000.6 | 4026.9 | 3977.4 | 3812.1 | 5191.7 | 3732.1 | 5487.9 | 5354.7 | 6050.7 | 4422.1 |
164 | MKD | POLYGON ((22.38053 42.32026, 22.88137 41.99930... | North Macedonia | 2065092.0 | 3329.6 | 3502.1 | 2839.3 | 3381.3 | 3900.0 | 3051.1 | 3859.2 | 2807.5 | 3714.6 | 3554.0 | 3664.2 | 3537.8 |
165 | SRB | POLYGON ((18.82982 45.90887, 18.82984 45.90888... | Serbia | 6844078.0 | 4958.8 | 4751.4 | 3701.7 | 5157.8 | 5960.6 | 4787.3 | 6165.9 | 3967.9 | 6130.6 | 6126.9 | 6559.4 | 5768.6 |
166 | MNE | POLYGON ((20.07070 42.58863, 19.80161 42.50009... | Montenegro | 620173.0 | 3321.4 | 3305.7 | 2638.8 | 3770.5 | 3451.5 | 3146.5 | 3261.7 | 3288.0 | 3311.8 | 3166.8 | 3260.0 | 3330.7 |
167 | TTO | POLYGON ((-61.68000 10.76000, -61.10500 10.890... | Trinidad and Tobago | 1403374.0 | 1667.4 | 1639.5 | 1471.7 | 1610.5 | 1329.4 | 1110.4 | 1444.3 | 1627.5 | 1691.7 | 1529.7 | 1520.3 | 1588.1 |
164 rows × 16 columns
# I will use the .idxmax() method to get the highest population in 2021
high_pop_2021 = earth_yield_2010_2021.loc[earth_yield_2010_2021['2021_pop'].idxmax()]
high_pop_2021 # Highest population is China, 1412360000.0.
Country Code CHN geometry MULTIPOLYGON (((109.47520958866365 18.19770091... Country Name_yield China 2021_pop 1412360000.0 2010_yield 5526.1 2011_yield 5709.4 2012_yield 5827.1 2013_yield 5894.1 2014_yield 5893.2 2015_yield 5985.7 2016_yield 6017.6 2017_yield 6111.3 2018_yield 6125.4 2019_yield 6265.9 2020_yield 6314.2 2021_yield 6320.8 Name: 136, dtype: object
high_pop_2021= high_pop_2021.iloc[4:] #I filter out the cereal yield from 2011 to 2021
high_pop_2021
2010_yield 5526.1 2011_yield 5709.4 2012_yield 5827.1 2013_yield 5894.1 2014_yield 5893.2 2015_yield 5985.7 2016_yield 6017.6 2017_yield 6111.3 2018_yield 6125.4 2019_yield 6265.9 2020_yield 6314.2 2021_yield 6320.8 Name: 136, dtype: object
high_pop_2021.pct_change()
2010_yield NaN 2011_yield 0.033170 2012_yield 0.020615 2013_yield 0.011498 2014_yield -0.000153 2015_yield 0.015696 2016_yield 0.005329 2017_yield 0.015571 2018_yield 0.002307 2019_yield 0.022937 2020_yield 0.007708 2021_yield 0.001045 Name: 136, dtype: float64
#Plotting percentage change in cereal yield from 2011 to 2021
high_pop_2021.pct_change().plot()
<Axes: >
Brief Explanation¶
- The percentage change in cereal yield in China started at 0.033170% in 2011.
- It declined steadily between 2011 and 2013, and in 2014 it went negative at -0.000153%.
- There was quite a sharp rise in 2015 and a drop again in 2016.
- It rose by almost the same proportion in 2017 as it declined in 2016. That is, the gradient of fall in 2016 is almost the same as the gradient of rise in 2017.
- There is a sharp rise between 2018 and 2019.
- From 2019 to 2021, the cereal yield went into free decline.
- It eventually closed at 0.001045% after a series of falls and rises within the periods under consideration.
Task 1.2.5¶
Present a scatter plot between the mean population of each country and the mean cereal yield from the year 2011 until 2021. Very briefly interpret the generated plot, particularly looking for any correlation (if present) among the plotted variables. In this question, you must consider each year between 2011 and 2021 to find the mean population and mean cereal yield.
earth_yield2.columns
Index(['Country Code', 'geometry', 'Country Name_yield', '2010_yield', '2011_yield', '2012_yield', '2013_yield', '2014_yield', '2015_yield', '2016_yield', '2017_yield', '2018_yield', '2019_yield', '2020_yield', '2021_yield', 'Country Name_pop', '2010_pop', '2011_pop', '2012_pop', '2013_pop', '2014_pop', '2015_pop', '2016_pop', '2017_pop', '2018_pop', '2019_pop', '2020_pop', '2021_pop'], dtype='object')
I will drop the 2010 columns. They were used to explain percentage change in crop yield only.
earth_yield2.drop(['2010_yield', '2010_pop'], axis=1, inplace=True)
earth_yield2.columns
Index(['Country Code', 'geometry', 'Country Name_yield', '2011_yield', '2012_yield', '2013_yield', '2014_yield', '2015_yield', '2016_yield', '2017_yield', '2018_yield', '2019_yield', '2020_yield', '2021_yield', 'Country Name_pop', '2011_pop', '2012_pop', '2013_pop', '2014_pop', '2015_pop', '2016_pop', '2017_pop', '2018_pop', '2019_pop', '2020_pop', '2021_pop'], dtype='object')
#Get the mean of the cereal yield and add mean yield column to the earth_yield data frame
earth_yield2['Mean_yield']=earth_yield2.iloc[:,3:13].mean(axis=1)
earth_yield2[:5]
Country Code | geometry | Country Name_yield | 2011_yield | 2012_yield | 2013_yield | 2014_yield | 2015_yield | 2016_yield | 2017_yield | ... | 2013_pop | 2014_pop | 2015_pop | 2016_pop | 2017_pop | 2018_pop | 2019_pop | 2020_pop | 2021_pop | Mean_yield | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | FJI | MULTIPOLYGON (((180.00000 -16.06713, 180.00000... | Fiji | 2821.3 | 2423.4 | 2999.5 | 4475.4 | 3001.0 | 3000.9 | 3000.9 | ... | 865602.0 | 866447.0 | 868632.0 | 872406.0 | 877460.0 | 883490.0 | 889955.0 | 896444.0 | 902899.0 | 3174.37 |
1 | TZA | POLYGON ((33.90371 -0.95000, 34.07262 -1.05982... | Tanzania | 1390.4 | 1314.8 | 1418.0 | 1524.8 | 1449.4 | 1570.4 | 1692.0 | ... | 48483132.0 | 49960563.0 | 51482638.0 | 53049231.0 | 54660345.0 | 56313444.0 | 58005461.0 | 59734213.0 | 61498438.0 | 1588.00 |
2 | CAN | MULTIPOLYGON (((-122.84000 49.00000, -122.9742... | Canada | 3552.4 | 3456.3 | 4160.4 | 3647.0 | 3673.0 | 4239.1 | 4104.8 | ... | 35082954.0 | 35437435.0 | 35702908.0 | 36109487.0 | 36545236.0 | 37065084.0 | 37601230.0 | 38037204.0 | 38246108.0 | 3885.39 |
3 | USA | MULTIPOLYGON (((-122.84000 49.00000, -120.0000... | United States | 6803.5 | 5911.9 | 7300.9 | 7638.1 | 7430.1 | 8614.2 | 8281.3 | ... | 316059947.0 | 318386329.0 | 320738994.0 | 323071755.0 | 325122128.0 | 326838199.0 | 328329953.0 | 331501080.0 | 331893745.0 | 7632.78 |
4 | KAZ | POLYGON ((87.35997 49.21498, 86.59878 48.54918... | Kazakhstan | 1688.6 | 865.0 | 1164.9 | 1172.7 | 1278.1 | 1347.7 | 1355.0 | ... | 17035551.0 | 17288285.0 | 17542806.0 | 17794055.0 | 18037776.0 | 18276452.0 | 18513673.0 | 18755666.0 | 19002586.0 | 1267.49 |
5 rows × 27 columns
earth_yield2.columns
Index(['Country Code', 'geometry', 'Country Name_yield', '2011_yield', '2012_yield', '2013_yield', '2014_yield', '2015_yield', '2016_yield', '2017_yield', '2018_yield', '2019_yield', '2020_yield', '2021_yield', 'Country Name_pop', '2011_pop', '2012_pop', '2013_pop', '2014_pop', '2015_pop', '2016_pop', '2017_pop', '2018_pop', '2019_pop', '2020_pop', '2021_pop', 'Mean_yield'], dtype='object')
earth_yield2['Mean_pop']=earth_yield2.iloc[:,15:25].mean(axis=1)
earth_yield2[:2]
Country Code | geometry | Country Name_yield | 2011_yield | 2012_yield | 2013_yield | 2014_yield | 2015_yield | 2016_yield | 2017_yield | ... | 2014_pop | 2015_pop | 2016_pop | 2017_pop | 2018_pop | 2019_pop | 2020_pop | 2021_pop | Mean_yield | Mean_pop | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | FJI | MULTIPOLYGON (((180.00000 -16.06713, 180.00000... | Fiji | 2821.3 | 2423.4 | 2999.5 | 4475.4 | 3001.0 | 3000.9 | 3000.9 | ... | 866447.0 | 868632.0 | 872406.0 | 877460.0 | 883490.0 | 889955.0 | 896444.0 | 902899.0 | 3174.37 | 874895.2 |
1 | TZA | POLYGON ((33.90371 -0.95000, 34.07262 -1.05982... | Tanzania | 1390.4 | 1314.8 | 1418.0 | 1524.8 | 1449.4 | 1570.4 | 1692.0 | ... | 49960563.0 | 51482638.0 | 53049231.0 | 54660345.0 | 56313444.0 | 58005461.0 | 59734213.0 | 61498438.0 | 1588.00 | 52441558.0 |
2 rows × 28 columns
select useful columns
earth_mean = earth_yield2[['geometry','Country Name_yield','Mean_yield', 'Mean_pop']]
earth_mean.sort_values(by='Mean_yield', ascending= False) #sort descending by mean yield
geometry | Country Name_yield | Mean_yield | Mean_pop | |
---|---|---|---|---|
81 | POLYGON ((51.57952 24.24550, 51.75744 24.29407... | United Arab Emirates | 25734.23 | 9390343.5 |
83 | POLYGON ((47.97452 29.97582, 48.18319 29.53448... | Kuwait | 12671.55 | 3819773.3 |
85 | MULTIPOLYGON (((55.20834 22.70833, 55.23449 23... | Oman | 12099.03 | 4286476.7 |
126 | POLYGON ((6.15666 50.80372, 6.04307 50.12805, ... | Belgium | 8760.31 | 11295471.1 |
127 | POLYGON ((6.90514 53.48216, 7.09205 53.14404, ... | Netherlands | 8385.30 | 17023700.7 |
... | ... | ... | ... | ... |
86 | MULTIPOLYGON (((167.21680 -15.89185, 167.84488... | Vanuatu | 604.37 | 274734.8 |
13 | POLYGON ((24.56737 8.22919, 23.80581 8.66632, ... | Sudan | 600.36 | 39462148.6 |
46 | POLYGON ((29.43219 -22.09131, 28.01724 -22.827... | Botswana | 533.02 | 2160123.9 |
52 | POLYGON ((14.85130 22.86295, 15.09689 21.30852... | Niger | 498.37 | 20500747.4 |
47 | POLYGON ((19.89577 -24.76779, 19.89473 -28.461... | Namibia | 426.54 | 2341771.5 |
164 rows × 4 columns
earth_mean.sort_values(by='Mean_pop', ascending= False) #sort descending by mean population
geometry | Country Name_yield | Mean_yield | Mean_pop | |
---|---|---|---|---|
136 | MULTIPOLYGON (((109.47521 18.19770, 108.65521 ... | China | 6014.39 | 1.381980e+09 |
95 | POLYGON ((97.32711 28.26158, 97.40256 27.88254... | India | 3084.68 | 1.316492e+09 |
3 | MULTIPOLYGON (((-122.84000 49.00000, -120.0000... | United States | 7632.78 | 3.215510e+08 |
7 | MULTIPOLYGON (((141.00021 -2.60015, 141.01706 ... | Indonesia | 5149.45 | 2.596911e+08 |
26 | POLYGON ((-53.37366 -33.76838, -53.65054 -33.2... | Brazil | 4786.45 | 2.052148e+08 |
... | ... | ... | ... | ... |
145 | POLYGON ((115.45071 5.44773, 115.40570 4.95523... | Brunei Darussalam | 1657.45 | 4.165801e+05 |
18 | MULTIPOLYGON (((-78.98000 26.79000, -78.51000 ... | Bahamas, The | 7870.76 | 3.763192e+05 |
36 | POLYGON ((-89.14308 17.80832, -89.15091 17.955... | Belize | 3462.48 | 3.643453e+05 |
86 | MULTIPOLYGON (((167.21680 -15.89185, 167.84488... | Vanuatu | 604.37 | 2.747348e+05 |
131 | POLYGON ((165.77999 -21.08000, 166.59999 -21.7... | New Caledonia | 5758.90 | 2.667010e+05 |
164 rows × 4 columns
INVESTIGATING RELATIONSHIP¶
I will have a working hypothesis:
Ho: There is no significant relationship between the mean population and the mean yield.
Please note that I have taken mean yield as my dependent variable and placed it on the Y-axis and the independent variable, population, on the X-axis.
earth_mean.plot(kind='scatter', x='Mean_pop', y='Mean_yield')
<Axes: xlabel='Mean_pop', ylabel='Mean_yield'>
I will plot this with plotly.express for better view and easier explanation
import plotly.express as px
fig = px.scatter(earth_mean,x='Mean_pop', y='Mean_yield', color= 'Country Name_yield')
fig.show()
Pionts are too clustered at the bottom-right, I will cut off the outliers of population (at 3 million) and mean yield (at 10,000)
cut_outliers= earth_mean.loc[(earth_mean.Mean_pop<300000000) & (earth_mean.Mean_yield<10000)]
fig = px.scatter(cut_outliers,x='Mean_pop', y='Mean_yield', color= 'Country Name_yield')
fig.show()
Points are easier to view now, I can accept my null hypothesis at this point. However, I shall attempt a spatial view of possible relationships
fig, ax = plt.subplots(figsize=(20,16))
earth.plot(ax=ax, color= "black")
earth_mean.plot(ax=ax, column='Mean_yield', legend=True, legend_kwds={"label":'Mean cereal yield of countries between 2011 and 2021', "orientation":"horizontal"}, cmap='Set1')
plt.show()
earth_mean.explore(column='Mean_yield', # make choropleth based on "BoroName" column
tooltip=['Country Name_yield','Mean_yield','Mean_pop',], # show "BoroName" value in tooltip (on hover)
popup=True, # show all values in popup (on click)
tiles="openstreetmap", # use "CartoDB positron" tiles
cmap="Set1", # use "Set1" matplotlib colormap
legend=True,
style_kwds=dict(color="black"), # use black outline
zoom_control=True,
zoom_start = 1,
)
Brief Explanation¶
- I did not find a significant relationship in values between the mean population and the mean cereal yield.
- There is, however, a spatial relationship. It is regionalized.
- Africa is the worst-performing region in cereal yield.
- No northern American country is at the bottom in cereal yield.
- Bolivia is the only South American country at the bottom in cereal production within the period under consideration.
- The United Arab Emirates (popularly called Dubai) has maintained a consistent lead in cereal yield over the years.
- Kuwait and Oman are the 2nd and 3rd best-performing countries in cereal yield, respectively.
- Belgium and the Netherlands are the 4th and 5th best-performing countries in the world, and they are the 1st and 2nd in Europe, respectively.
- New Caledonia, despite its low population, is a top-performing country in cereal yield per hectare.
- African countries need to increase their cereal yield production to prevent extreme hunger and increase their balance of trade..
TASK 2: Gospatial Sentiment Analysis Using Social Media Data¶
INTRODUCTION
In this part, I will apply geospatial sentiment analysis to Twitter data using the Python library, TextBlob. I am provided with a dataset consisting of tweets relevant to cryptocurrency
Task 2.1: Data Pre-processing
Instruction
Using a set of suitable Python libraries, randomly retrieve 500 tweets where user locations are available. You should also filter out the irrelevant characters, symbols, hashtags, URLs etc. from the tweets to avoid any possible masking of the actual sentiment associated with the tweets. From this point onward you should use the processed tweet data for all the subsequent analyses.
I will start by installing textblob
TextBlob is a Python library for processing textual data. It provides a simple API for diving into common natural language processing such as sentiment analysis. It is built on top of NLTK and Pattern (TextBlob: Simplified Text Processing — TextBlob 0.16.0 Documentation, 2023; Muhammad, 2023)
!pip install textblob
Requirement already satisfied: textblob in c:\users\user\anaconda3\lib\site-packages (0.17.1) Requirement already satisfied: nltk>=3.1 in c:\users\user\anaconda3\lib\site-packages (from textblob) (3.8.1) Requirement already satisfied: click in c:\users\user\anaconda3\lib\site-packages (from nltk>=3.1->textblob) (8.0.4) Requirement already satisfied: joblib in c:\users\user\anaconda3\lib\site-packages (from nltk>=3.1->textblob) (1.2.0) Requirement already satisfied: regex>=2021.8.3 in c:\users\user\anaconda3\lib\site-packages (from nltk>=3.1->textblob) (2022.7.9) Requirement already satisfied: tqdm in c:\users\user\anaconda3\lib\site-packages (from nltk>=3.1->textblob) (4.65.0) Requirement already satisfied: colorama in c:\users\user\anaconda3\lib\site-packages (from click->nltk>=3.1->textblob) (0.4.6)
Next will be to import my python libraries. Matplotlib is a python visualition tool. Pandas will help me create a dataframe and manage the dataframe
from textblob import TextBlob
import matplotlib.pyplot as plt
import pandas as pd
The bitcoin tweet file is huge with more than 4million rows. I beg your indulgence take a sample of 1 million rows. One million is a fair representation of the total sample
bitcoin =pd.read_csv(r"Bitcoin_tweets.csv", nrows=1000000)
C:\Users\User\AppData\Local\Temp\ipykernel_8832\894283730.py:1: DtypeWarning: Columns (5,7,8,12) have mixed types. Specify dtype option on import or set low_memory=False.
bitcoin
user_name | user_location | user_description | user_created | user_followers | user_friends | user_favourites | user_verified | date | text | hashtags | source | is_retweet | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | DeSota Wilson | Atlanta, GA | Biz Consultant, real estate, fintech, startups... | 39929.836910 | 8534 | 7605 | 4838.0 | FALSE | 44237.9993518519 | Blue Ridge Bank shares halted by NYSE after #b... | ['bitcoin'] | Twitter Web App | False |
1 | CryptoND | NaN | 😎 BITCOINLIVE is a Dutch platform aimed at inf... | 43755.841782 | 6769 | 1532 | 25483.0 | FALSE | 44237.9991666667 | 😎 Today, that's this #Thursday, we will do a "... | ['Thursday', 'Btc', 'wallet', 'security'] | Twitter for Android | False |
2 | Tdlmatias | London, England | IM Academy : The best #forex, #SelfEducation, ... | 41953.451817 | 128 | 332 | 924.0 | FALSE | 44237.9963888889 | Guys evening, I have read this article about B... | NaN | Twitter Web App | False |
3 | Crypto is the future | NaN | I will post a lot of buying signals for BTC tr... | 43736.700139 | 625 | 129 | 14.0 | FALSE | 44237.9962152778 | $BTC A big chance in a billion! Price: \487264... | ['Bitcoin', 'FX', 'BTC', 'crypto'] | dlvr.it | False |
4 | Alex Kirchmaier 🇦🇹🇸🇪 #FactsSuperspreader | Europa | Co-founder @RENJERJerky | Forbes 30Under30 | I... | 42403.552720 | 1249 | 1472 | 10482.0 | FALSE | 44237.9959027778 | This network is secured by 9 508 nodes as of t... | ['BTC'] | Twitter Web App | False |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
999995 | Bob Lji ∞/21 🟠 | Interlaken, Schweiz | DYOR + DCA + HODL #BITCOIN NoAlts+NoCryptos+No... | 44355.652176 | 391 | 1558 | 1243.0 | False | 44423.537072 | #wtfhappenedin1971? \n#Bitcoin will fix this! | ['wtfhappenedin1971', 'Bitcoin'] | Twitter for iPhone | False |
999996 | MoonTrip | Blockchain | Crypto Enthusiast | 44289.838113 | 384 | 491 | 4771.0 | False | 44423.537049 | @CryptoNadine @BabyArabiaBSC #babyarabia massi... | ['babyarabia', 'bitcoin', 'dev', 'sundayvibes'... | Twitter for Android | False |
999997 | I like Tea and #BSV | NaN | I like Tea and Buying #BSV | 44315.271470 | 0 | 0 | 2.0 | False | 44423.536944 | @mcuban Are you serious or part of the scam??\... | ['BSV', 'BITCOIN', 'BSV', 'greentechnology'] | Twitter for Android | False |
999998 | The Blockchain Advisor™ | Illinois, USA | Bridging the Gap Between Traditional Investing... | 40259.683646 | 970 | 854 | 9134.0 | False | 44423.536725 | Why is #Bitcoin going to ascend to a global re... | ['Bitcoin'] | Twitter Web App | False |
999999 | redforthebest | NaN | Creator of #CosmoHeads & #MintingClassics & #... | 44313.291424 | 2357 | 4571 | 1448.0 | False | 44423.536528 | #Bitcoin in #QRthings Collection\n(Only 0.03 e... | ['Bitcoin', 'QRthings', 'Crypto', 'Ethereum', ... | Twitter for Android | False |
1000000 rows × 13 columns
TextBlob is a Python library for processing textual data. It provides a simple API for diving into common natural language processing such as sentiment analysis. It is built on top of NLTK and Pattern (TextBlob: Simplified Text Processing — TextBlob 0.16.0 Documentation, 2023; Muhammad, 2023)
bitcoin= bitcoin.sample(n = 1500) # My first selection will be 1500 this is to compensate for missing values as null vlues will be drop
bitcoin
user_name | user_location | user_description | user_created | user_followers | user_friends | user_favourites | user_verified | date | text | hashtags | source | is_retweet | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
904542 | probtclife ⚡ | NaN | #Bitcoin | 44203.483310 | 27 | 142 | 4992.0 | False | 44427.516875 | @GaryGensler @FoxBusiness #Bitcoin fixes this | ['Bitcoin'] | Twitter Web App | False |
222839 | J***** T****** | Mainz | Get free $DFI worth 30$ @cakedefi, use the fol... | 44107.647488 | 44 | 194 | 3930.0 | False | 44368.595752 | @CaptainCryptoHD @Cryptanzee Thanks for the qu... | ['Bitcoin'] | Twitter for Android | False |
75023 | Maximus Monius | NaN | Crypto News, Technical Analysis, Tips & Tricks... | 44246.658889 | 72 | 130 | 102.0 | False | 44310.680347 | #BITCOIN UPDATE - April 24th\n\nWe seem to be ... | ['BITCOIN', 'BTC', 'ALTSEASON'] | Twitter Web App | False |
951672 | Rahul Singh Sirohi | Ghaziabad, India | Advisor & Developer in #Crypto Industry Since ... | 42906.352164 | 1406 | 1980 | 5134.0 | False | 44425.657338 | When #Bitcoin Crashed & goes to #Bearish M... | ['Bitcoin', 'Bearish', 'BTC', 'Crypto', 'AskTo... | Twitter for Android | False |
339824 | JcL.Rheendy | NaN | NaN | 44039.657245 | 40 | 1357 | 594.0 | False | 44379.606065 | @WSB_WallStreet one of the very big giveaways ... | ['giveaway', 'bitcoin', 'USDC', 'ETHEREUM', 'T... | Twitter for Android | False |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
912730 | BitcoinAgile | Matter Doesn't Matter | Breaking News. Bitcoin, Blockchain & Beyond. #... | 41646.994977 | 59940 | 13883 | 8737.0 | False | 44427.132847 | TA: #bitcoin Turns Red, What Could Trigger Mor... | ['bitcoin', 'btcusd', 'btcusdt', 'xbtusd'] | bitcoinagile | False |
347606 | VaxBLR | Bengaluru, India | Hourly updates on FREE and PAID 18+ and 45+ va... | 44368.364282 | 12 | 0 | 0.0 | False | 44400.375208 | 45+ #URBAN #Bengaluru #CovidVaccine Availabili... | ['URBAN', 'Bengaluru', 'CovidVaccine', 'COVISH... | VaxBlr | False |
426243 | 𝕍ℝaj 🇮🇳 | Mumbai | Beating Inflation by Equities. Penchant for Pa... | 40552.570046 | 117 | 917 | 4369.0 | False | 44399.662593 | @ramrastogi @LinkedIn Sir I think inexorable c... | ['Bitcoin'] | Twitter for Android | False |
366435 | thank u, next | Greece | NaN | 40998.724236 | 699 | 1954 | 96384.0 | False | 44400.478669 | The opening ceremony is so boring like people ... | ['OlympicGames'] | Twitter for iPhone | False |
386290 | Mantas | NaN | Crypto enthusiast 🪙🚀🌕🐂 | 43099.398368 | 16 | 175 | 999.0 | False | 44401.31206 | Bullish. #BTC #Bitcoin | ['BTC', 'Bitcoin'] | Twitter for iPhone | False |
1500 rows × 13 columns
bitcoin.isna().sum()
user_name 0 user_location 710 user_description 163 user_created 0 user_followers 0 user_friends 0 user_favourites 0 user_verified 0 date 0 text 0 hashtags 31 source 3 is_retweet 0 dtype: int64
bitcoin.dropna(inplace= True)
bitcoin.isna().sum()
user_name 0 user_location 0 user_description 0 user_created 0 user_followers 0 user_friends 0 user_favourites 0 user_verified 0 date 0 text 0 hashtags 0 source 0 is_retweet 0 dtype: int64
bitcoin.shape
(737, 13)
bitcoin #inspect DF
user_name | user_location | user_description | user_created | user_followers | user_friends | user_favourites | user_verified | date | text | hashtags | source | is_retweet | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
222839 | J***** T****** | Mainz | Get free $DFI worth 30$ @cakedefi, use the fol... | 44107.647488 | 44 | 194 | 3930.0 | False | 44368.595752 | @CaptainCryptoHD @Cryptanzee Thanks for the qu... | ['Bitcoin'] | Twitter for Android | False |
951672 | Rahul Singh Sirohi | Ghaziabad, India | Advisor & Developer in #Crypto Industry Since ... | 42906.352164 | 1406 | 1980 | 5134.0 | False | 44425.657338 | When #Bitcoin Crashed & goes to #Bearish M... | ['Bitcoin', 'Bearish', 'BTC', 'Crypto', 'AskTo... | Twitter for Android | False |
788279 | 🔔system'cRe5520' | AWS eu-west-1a Ireland Region | The channel breakout trading strategy bot for ... | 41062.324120 | 158 | 16 | 63.0 | False | 44417.625035 | strategy: 5010HL1h atr20d: 2196.69\n\n09 Aug 2... | ['BTC', 'BitMEX'] | system'cRe5520' | False |
456359 | Cryptorphic | Moon | #Bitcoin Certified Technical Analyst I Margin ... | 43283.686458 | 4176 | 33 | 249.0 | False | 44398.774907 | IMHO Do not long or Short anything now with le... | ['Crypto', 'Btc', 'Bitcoin'] | Twitter for iPhone | False |
741240 | DoopieCash® | The Hague, The Netherlands | ▫ Professional Crypto & FX Trader ▫ Technical ... | 43203.412604 | 21365 | 209 | 4714.0 | False | 44417.600694 | Time to melt 🔥🔥🔥\n\n#Bitcoin on fire\n\n#BTC ... | ['Bitcoin', 'BTC', 'crypto'] | Twitter Web App | False |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
356131 | JJ | Raleigh, NC | nobody really cares what I put here. | 40875.925961 | 306 | 462 | 8626.0 | False | 44393.935093 | So many dumb replies. The #CovidVaccine does n... | ['CovidVaccine', 'VaccineEducation'] | Twitter Web App | False |
665361 | Andre phill | Ontario, Canada | Masters at business Administration-MBA at Ohio... | 44403.469086 | 8 | 49 | 12.0 | FALSE | 44405.1377083333 | @APompliano Trust has an unlimited value, The ... | ['BitcoinCash', 'money', 'dollar', 'btc'] | Twitter for iPhone | False |
912730 | BitcoinAgile | Matter Doesn't Matter | Breaking News. Bitcoin, Blockchain & Beyond. #... | 41646.994977 | 59940 | 13883 | 8737.0 | False | 44427.132847 | TA: #bitcoin Turns Red, What Could Trigger Mor... | ['bitcoin', 'btcusd', 'btcusdt', 'xbtusd'] | bitcoinagile | False |
347606 | VaxBLR | Bengaluru, India | Hourly updates on FREE and PAID 18+ and 45+ va... | 44368.364282 | 12 | 0 | 0.0 | False | 44400.375208 | 45+ #URBAN #Bengaluru #CovidVaccine Availabili... | ['URBAN', 'Bengaluru', 'CovidVaccine', 'COVISH... | VaxBlr | False |
426243 | 𝕍ℝaj 🇮🇳 | Mumbai | Beating Inflation by Equities. Penchant for Pa... | 40552.570046 | 117 | 917 | 4369.0 | False | 44399.662593 | @ramrastogi @LinkedIn Sir I think inexorable c... | ['Bitcoin'] | Twitter for Android | False |
737 rows × 13 columns
bitcoin['user_location'] #inspect location
222839 Mainz 951672 Ghaziabad, India 788279 AWS eu-west-1a Ireland Region 456359 Moon 741240 The Hague, The Netherlands ... 356131 Raleigh, NC 665361 Ontario, Canada 912730 Matter Doesn't Matter 347606 Bengaluru, India 426243 Mumbai Name: user_location, Length: 737, dtype: object
I will filter out columns that are useful to my study
bitcoin.columns
Index(['user_name', 'user_location', 'user_description', 'user_created', 'user_followers', 'user_friends', 'user_favourites', 'user_verified', 'date', 'text', 'hashtags', 'source', 'is_retweet'], dtype='object')
bitcoin = bitcoin[['user_location', 'text', 'user_followers']]
bitcoin
user_location | text | user_followers | |
---|---|---|---|
222839 | Mainz | @CaptainCryptoHD @Cryptanzee Thanks for the qu... | 44 |
951672 | Ghaziabad, India | When #Bitcoin Crashed & goes to #Bearish M... | 1406 |
788279 | AWS eu-west-1a Ireland Region | strategy: 5010HL1h atr20d: 2196.69\n\n09 Aug 2... | 158 |
456359 | Moon | IMHO Do not long or Short anything now with le... | 4176 |
741240 | The Hague, The Netherlands | Time to melt 🔥🔥🔥\n\n#Bitcoin on fire\n\n#BTC ... | 21365 |
... | ... | ... | ... |
356131 | Raleigh, NC | So many dumb replies. The #CovidVaccine does n... | 306 |
665361 | Ontario, Canada | @APompliano Trust has an unlimited value, The ... | 8 |
912730 | Matter Doesn't Matter | TA: #bitcoin Turns Red, What Could Trigger Mor... | 59940 |
347606 | Bengaluru, India | 45+ #URBAN #Bengaluru #CovidVaccine Availabili... | 12 |
426243 | Mumbai | @ramrastogi @LinkedIn Sir I think inexorable c... | 117 |
737 rows × 3 columns
Task 2.1: Data Pre-processing¶
Using a set of suitable Python libraries, randomly retrieve 500 tweets where user locations are available. You should also filter out the irrelevant characters, symbols, hashtags, URLs etc. from the tweets to avoid any possible masking of the actual sentiment associated with the tweets. From this point onward you should use the processed tweet data for all the subsequent analyses.
import re
I am going to define a function that identifies and removes irrelevant characters from the tweets. This is the essence of the 'import re' statement above. A regular expression (or RE) specifies a set of strings that match it; the functions in this module let you check if a particular string matches a given regular expression (Re: Regular Expression Operations, 2023).
def remove_rt(x): return re.sub('RT @\w+: ', " ", x)
def rt(x): return re.sub(
"(@[A-Za-z0-9]+)|([^0-9A-Za-z \t])|(\w+:\/\/\S+)", " ", x)
bitcoin["text"] = bitcoin.text.map(remove_rt).map(rt)
bitcoin["text"] = bitcoin.text.str.lower()
C:\Users\User\AppData\Local\Temp\ipykernel_8832\2926551215.py:6: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\User\AppData\Local\Temp\ipykernel_8832\2926551215.py:7: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
bitcoin #inspect dataset
user_location | text | user_followers | |
---|---|---|---|
222839 | Mainz | thanks for the question my favourite one... | 44 |
951672 | Ghaziabad, India | when bitcoin crashed amp goes to bearish m... | 1406 |
788279 | AWS eu-west-1a Ireland Region | strategy 5010hl1h atr20d 2196 69 09 aug 202... | 158 |
456359 | Moon | imho do not long or short anything now with le... | 4176 |
741240 | The Hague, The Netherlands | time to melt bitcoin on fire btc cry... | 21365 |
... | ... | ... | ... |
356131 | Raleigh, NC | so many dumb replies the covidvaccine does n... | 306 |
665361 | Ontario, Canada | trust has an unlimited value the father of ... | 8 |
912730 | Matter Doesn't Matter | ta bitcoin turns red what could trigger mor... | 59940 |
347606 | Bengaluru, India | 45 urban bengaluru covidvaccine availabili... | 12 |
426243 | Mumbai | sir i think inexorable certainty that cent... | 117 |
737 rows × 3 columns
print(bitcoin['user_location'].value_counts()) #inspect before geocoding
user_location Bay Area, CA 25 United States 13 Global 12 London, England 12 Australia 11 .. Croatia 1 Bangladesh...... Khulna 1 Lewes, DE 1 florida 1 Mumbai 1 Name: count, Length: 469, dtype: int64
Task 2.2: Geocoding¶
Geocode on all the 500 tweets retrieved and filtered in the previous step. To perform geocoding, you must be using a Python-based tool. Once the geocoding is performed then augment the tweet data set with two extra columns. One column should contain latitude and the other one should contain longitude information corresponding to a tweet
geopy is a Python client for several popular geocoding web services (Geopy, 2023),¶
geopy. (2023, November 23). PyPI. https://pypi.org/project/geopy/
pip install geopy
Requirement already satisfied: geopy in c:\users\user\anaconda3\lib\site-packages (2.4.1)Note: you may need to restart the kernel to use updated packages. Requirement already satisfied: geographiclib<3,>=1.52 in c:\users\user\anaconda3\lib\site-packages (from geopy) (2.0)
# Import Nominatim
from geopy.geocoders import Nominatim
geolocator = Nominatim(user_agent="Assessment")
from geopy.extra.rate_limiter import RateLimiter
I am going to duplicate the user_location column but call it locations. The essence is to have a column that keeps the original state of the user location ater applying geocoding to the fataframe
bitcoin['locations'] = bitcoin['user_location']
C:\Users\User\AppData\Local\Temp\ipykernel_8832\3072469440.py:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
bitcoin
user_location | text | user_followers | locations | |
---|---|---|---|---|
222839 | Mainz | thanks for the question my favourite one... | 44 | Mainz |
951672 | Ghaziabad, India | when bitcoin crashed amp goes to bearish m... | 1406 | Ghaziabad, India |
788279 | AWS eu-west-1a Ireland Region | strategy 5010hl1h atr20d 2196 69 09 aug 202... | 158 | AWS eu-west-1a Ireland Region |
456359 | Moon | imho do not long or short anything now with le... | 4176 | Moon |
741240 | The Hague, The Netherlands | time to melt bitcoin on fire btc cry... | 21365 | The Hague, The Netherlands |
... | ... | ... | ... | ... |
356131 | Raleigh, NC | so many dumb replies the covidvaccine does n... | 306 | Raleigh, NC |
665361 | Ontario, Canada | trust has an unlimited value the father of ... | 8 | Ontario, Canada |
912730 | Matter Doesn't Matter | ta bitcoin turns red what could trigger mor... | 59940 | Matter Doesn't Matter |
347606 | Bengaluru, India | 45 urban bengaluru covidvaccine availabili... | 12 | Bengaluru, India |
426243 | Mumbai | sir i think inexorable certainty that cent... | 117 | Mumbai |
737 rows × 4 columns
geocode = RateLimiter(geolocator.geocode, min_delay_seconds=1)
bitcoin['user_location'] = bitcoin['locations'].apply(geocode)
bitcoin['latitude'] = bitcoin['user_location'].apply(lambda x: x.latitude if x else None)
bitcoin['longitude'] = bitcoin['user_location'].apply(lambda x: x.longitude if x else None)
bitcoin
print(bitcoin['locations'].value_counts())
bitcoin.dropna(inplace = True)
bitcoin.isna().sum()
bitcoin
Next step will be to take out a sample 500 at this point as all tweets in my bitcoin2 dataframe have location
bitcoin2=bitcoin.sample(n=500)
bitcoin2
bitcoin2.to_csv('bitcoin_final.csv') #save dataframe as a csv file for future reference
bitcoin2 =gpd.read_file('bitcoin_final.csv')
bitcoin2
field_1 | user_location | text | user_followers | locations | latitude | longitude | geometry | |
---|---|---|---|---|---|---|---|---|
0 | 714437 | The Moon, Sylacauga, Talladega County, Alabama... | talked to about stacks stx and hopes the... | 39 | The Moon | 33.1584497 | -86.2385846 | None |
1 | 280261 | Lisboa, Portugal | i m so f tired of ignorant american phd econom... | 1614 | Lisbon, Portugal | 38.7077507 | -9.1365919 | None |
2 | 336653 | Muhu, Saare maakond, Eesti | long bitcoin | 43 | Moon | 58.5959044 | 23.21964608602439 | None |
3 | 872349 | Global, 81, Barber Greene Road, Don Mills, Don... | bitcoin bull cathie wood attracts big short... | 77240 | Global | 43.7283874 | -79.34914879325001 | None |
4 | 109405 | Canada | bitcoin price in us dollar btc usd btcusd ... | 250 | Canada | 61.0666922 | -107.991707 | None |
... | ... | ... | ... | ... | ... | ... | ... | ... |
495 | 898076 | Hyderabad, Bahadurpura mandal, Hyderabad Distr... | freedom 35 official bsc bitcoin via | 112 | Hyderabad, India | 17.360589 | 78.4740613 | None |
496 | 951645 | b'424c4f434b434841494e2c20d09dd0b0d0b1d0b5d180... | bitcoin btc i don t like these lower highs... | 21254 | #blockchain | 44.6465984 | 34.4007341 | None |
497 | 7974 | Sydney, Council of the City of Sydney, New Sou... | what goes up must comes down hard question ... | 26 | Sydney, New South Wales | -33.8698439 | 151.2082848 | None |
498 | 788995 | Montréal, Agglomération de Montréal, MontrÃ... | am i the only one that likes watching the sats... | 135 | Montréal, Québec | 45.5031824 | -73.5698065 | None |
499 | 62652 | Laguna Beach, Orange County, California, Unite... | lamb is a fast safe and scalable blockchai... | 3784 | Laguna Beach, CA | 33.5426975 | -117.785366 | None |
500 rows × 8 columns
bitcoin2.reset_index(drop=True, inplace=True) #reset index for a nice looking DF
bitcoin2 #inspect
field_1 | user_location | text | user_followers | locations | latitude | longitude | geometry | |
---|---|---|---|---|---|---|---|---|
0 | 714437 | The Moon, Sylacauga, Talladega County, Alabama... | talked to about stacks stx and hopes the... | 39 | The Moon | 33.1584497 | -86.2385846 | None |
1 | 280261 | Lisboa, Portugal | i m so f tired of ignorant american phd econom... | 1614 | Lisbon, Portugal | 38.7077507 | -9.1365919 | None |
2 | 336653 | Muhu, Saare maakond, Eesti | long bitcoin | 43 | Moon | 58.5959044 | 23.21964608602439 | None |
3 | 872349 | Global, 81, Barber Greene Road, Don Mills, Don... | bitcoin bull cathie wood attracts big short... | 77240 | Global | 43.7283874 | -79.34914879325001 | None |
4 | 109405 | Canada | bitcoin price in us dollar btc usd btcusd ... | 250 | Canada | 61.0666922 | -107.991707 | None |
... | ... | ... | ... | ... | ... | ... | ... | ... |
495 | 898076 | Hyderabad, Bahadurpura mandal, Hyderabad Distr... | freedom 35 official bsc bitcoin via | 112 | Hyderabad, India | 17.360589 | 78.4740613 | None |
496 | 951645 | b'424c4f434b434841494e2c20d09dd0b0d0b1d0b5d180... | bitcoin btc i don t like these lower highs... | 21254 | #blockchain | 44.6465984 | 34.4007341 | None |
497 | 7974 | Sydney, Council of the City of Sydney, New Sou... | what goes up must comes down hard question ... | 26 | Sydney, New South Wales | -33.8698439 | 151.2082848 | None |
498 | 788995 | Montréal, Agglomération de Montréal, MontrÃ... | am i the only one that likes watching the sats... | 135 | Montréal, Québec | 45.5031824 | -73.5698065 | None |
499 | 62652 | Laguna Beach, Orange County, California, Unite... | lamb is a fast safe and scalable blockchai... | 3784 | Laguna Beach, CA | 33.5426975 | -117.785366 | None |
500 rows × 8 columns
Next step will be to convert longitude and latitude to geometry and assign it to "EPSG:4326" coordinate reference system Code Source: GeoPandas Documentation (2023)
bitcoin_tweet = gpd.GeoDataFrame(bitcoin2, geometry=gpd.points_from_xy(bitcoin2.longitude, bitcoin2.latitude), crs="EPSG:4326"
)
bitcoin_tweet
field_1 | user_location | text | user_followers | locations | latitude | longitude | geometry | |
---|---|---|---|---|---|---|---|---|
0 | 714437 | The Moon, Sylacauga, Talladega County, Alabama... | talked to about stacks stx and hopes the... | 39 | The Moon | 33.1584497 | -86.2385846 | POINT (-86.23858 33.15845) |
1 | 280261 | Lisboa, Portugal | i m so f tired of ignorant american phd econom... | 1614 | Lisbon, Portugal | 38.7077507 | -9.1365919 | POINT (-9.13659 38.70775) |
2 | 336653 | Muhu, Saare maakond, Eesti | long bitcoin | 43 | Moon | 58.5959044 | 23.21964608602439 | POINT (23.21965 58.59590) |
3 | 872349 | Global, 81, Barber Greene Road, Don Mills, Don... | bitcoin bull cathie wood attracts big short... | 77240 | Global | 43.7283874 | -79.34914879325001 | POINT (-79.34915 43.72839) |
4 | 109405 | Canada | bitcoin price in us dollar btc usd btcusd ... | 250 | Canada | 61.0666922 | -107.991707 | POINT (-107.99171 61.06669) |
... | ... | ... | ... | ... | ... | ... | ... | ... |
495 | 898076 | Hyderabad, Bahadurpura mandal, Hyderabad Distr... | freedom 35 official bsc bitcoin via | 112 | Hyderabad, India | 17.360589 | 78.4740613 | POINT (78.47406 17.36059) |
496 | 951645 | b'424c4f434b434841494e2c20d09dd0b0d0b1d0b5d180... | bitcoin btc i don t like these lower highs... | 21254 | #blockchain | 44.6465984 | 34.4007341 | POINT (34.40073 44.64660) |
497 | 7974 | Sydney, Council of the City of Sydney, New Sou... | what goes up must comes down hard question ... | 26 | Sydney, New South Wales | -33.8698439 | 151.2082848 | POINT (151.20828 -33.86984) |
498 | 788995 | Montréal, Agglomération de Montréal, MontrÃ... | am i the only one that likes watching the sats... | 135 | Montréal, Québec | 45.5031824 | -73.5698065 | POINT (-73.56981 45.50318) |
499 | 62652 | Laguna Beach, Orange County, California, Unite... | lamb is a fast safe and scalable blockchai... | 3784 | Laguna Beach, CA | 33.5426975 | -117.785366 | POINT (-117.78537 33.54270) |
500 rows × 8 columns
Task 2.3 Polarity analysis¶
Calculate the polarity values of all the tweets. For a given geographical location, if you have more than one tweet then find the average polarity value taking into consideration all the tweets generated from the same location. Using a suitable plot type (such as a geographical map), perform a geospatial visualisation of the polarities corresponding to all the tweets. Whilst you are free to choose a plot type, the visualisation must be clear and easy to understand/interpret.
#define a funtion that gets text polarity
def getTextPolarity(txt):
return TextBlob(txt).sentiment.polarity
bitcoin_tweet['polarity'] = bitcoin_tweet['text'].apply(getTextPolarity)
bitcoin_tweet.head()
field_1 | user_location | text | user_followers | locations | latitude | longitude | geometry | polarity | |
---|---|---|---|---|---|---|---|---|---|
0 | 714437 | The Moon, Sylacauga, Talladega County, Alabama... | talked to about stacks stx and hopes the... | 39 | The Moon | 33.1584497 | -86.2385846 | POINT (-86.23858 33.15845) | -0.125000 |
1 | 280261 | Lisboa, Portugal | i m so f tired of ignorant american phd econom... | 1614 | Lisbon, Portugal | 38.7077507 | -9.1365919 | POINT (-9.13659 38.70775) | -0.133333 |
2 | 336653 | Muhu, Saare maakond, Eesti | long bitcoin | 43 | Moon | 58.5959044 | 23.21964608602439 | POINT (23.21965 58.59590) | -0.050000 |
3 | 872349 | Global, 81, Barber Greene Road, Don Mills, Don... | bitcoin bull cathie wood attracts big short... | 77240 | Global | 43.7283874 | -79.34914879325001 | POINT (-79.34915 43.72839) | 0.000000 |
4 | 109405 | Canada | bitcoin price in us dollar btc usd btcusd ... | 250 | Canada | 61.0666922 | -107.991707 | POINT (-107.99171 61.06669) | 0.165000 |
import seaborn as sns #This is library built ontop matplotlib, it aids visualisation
sns.histplot(x='polarity', data= bitcoin_tweet, color='green') #Plot polarity values count with sns
<Axes: xlabel='polarity', ylabel='Count'>
Define a funtion that labels the polarity values, (Oluyale, 2023).
def definepolarity(x):
if x > 0.00:
return "Positive"
elif x < 0.00:
return "Negative"
elif x == 0:
return "Neutral"
Apply the defined function and create another column for polarity label
bitcoin_tweet["polarity_label"] = bitcoin_tweet["polarity"].apply(definepolarity)
bitcoin_tweet #inspect
field_1 | user_location | text | user_followers | locations | latitude | longitude | geometry | polarity | polarity_label | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 714437 | The Moon, Sylacauga, Talladega County, Alabama... | talked to about stacks stx and hopes the... | 39 | The Moon | 33.1584497 | -86.2385846 | POINT (-86.23858 33.15845) | -0.125000 | Negative |
1 | 280261 | Lisboa, Portugal | i m so f tired of ignorant american phd econom... | 1614 | Lisbon, Portugal | 38.7077507 | -9.1365919 | POINT (-9.13659 38.70775) | -0.133333 | Negative |
2 | 336653 | Muhu, Saare maakond, Eesti | long bitcoin | 43 | Moon | 58.5959044 | 23.21964608602439 | POINT (23.21965 58.59590) | -0.050000 | Negative |
3 | 872349 | Global, 81, Barber Greene Road, Don Mills, Don... | bitcoin bull cathie wood attracts big short... | 77240 | Global | 43.7283874 | -79.34914879325001 | POINT (-79.34915 43.72839) | 0.000000 | Neutral |
4 | 109405 | Canada | bitcoin price in us dollar btc usd btcusd ... | 250 | Canada | 61.0666922 | -107.991707 | POINT (-107.99171 61.06669) | 0.165000 | Positive |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
495 | 898076 | Hyderabad, Bahadurpura mandal, Hyderabad Distr... | freedom 35 official bsc bitcoin via | 112 | Hyderabad, India | 17.360589 | 78.4740613 | POINT (78.47406 17.36059) | 0.000000 | Neutral |
496 | 951645 | b'424c4f434b434841494e2c20d09dd0b0d0b1d0b5d180... | bitcoin btc i don t like these lower highs... | 21254 | #blockchain | 44.6465984 | 34.4007341 | POINT (34.40073 44.64660) | 0.000000 | Neutral |
497 | 7974 | Sydney, Council of the City of Sydney, New Sou... | what goes up must comes down hard question ... | 26 | Sydney, New South Wales | -33.8698439 | 151.2082848 | POINT (151.20828 -33.86984) | -0.223611 | Negative |
498 | 788995 | Montréal, Agglomération de Montréal, MontrÃ... | am i the only one that likes watching the sats... | 135 | Montréal, Québec | 45.5031824 | -73.5698065 | POINT (-73.56981 45.50318) | 0.250000 | Positive |
499 | 62652 | Laguna Beach, Orange County, California, Unite... | lamb is a fast safe and scalable blockchai... | 3784 | Laguna Beach, CA | 33.5426975 | -117.785366 | POINT (-117.78537 33.54270) | 0.350000 | Positive |
500 rows × 10 columns
#Plot polarity label count with sns
plt.figure(figsize=(12,4))
sns.countplot(x='polarity_label', data=bitcoin_tweet)
plt.title('Polarity Label Count')
plt.show()
bitcoin_tweet.columns
Index(['field_1', 'user_location', 'text', 'user_followers', 'locations', 'latitude', 'longitude', 'geometry', 'polarity', 'polarity_label'], dtype='object')
Filter the useful columns for map visualization
bitcoin_tweet2= bitcoin_tweet[['geometry', 'polarity', 'polarity_label']]
Plotting Polarity¶
Polarity Values
fig, ax = plt.subplots(figsize=(20,16))
earth.plot(ax=ax, color= "gray")
bitcoin_tweet2.plot(ax=ax, column='polarity', legend=True, legend_kwds={"label": 'Polarity Values Spread of Bitcoin Tweets', "orientation":"horizontal"}, cmap='Set1')
plt.show()
Polarity Values on Interactive map with geopandas .explore()
bitcoin_tweet2.explore(column='polarity', # make choropleth based on "polarity" column
tooltip=['polarity'], # show "polarity" value in tooltip (on hover)
popup=True, # show all values in popup (on click)
tiles="openstreetmap", # use "openstreetmap" tiles
cmap="Set1", # use "Set1" matplotlib colormap
legend=True,
marker_kwds=dict(radius=5,icon=folium.Icon(icon='house-blank')),
style_kwds=dict(color="black"), # use black outline
zoom_control=True,
#zoom_start = 1,
)
Polarity Labels
fig, ax = plt.subplots(figsize=(20,16))
earth.plot(ax=ax, color= "grey")
bitcoin_tweet2.plot(ax=ax, column='polarity_label', legend=True, cmap='viridis')
plt.show()
Polarity Values on Interactive Map with GeoPandas .explore()
bitcoin_tweet2.explore(column='polarity_label', # make choropleth based on "polarity_label" column
tooltip=['polarity_label'], # show "polarity_label" value in tooltip (on hover)
popup=True, # show all values in popup (on click)
tiles="openstreetmap", # use "openstreetmap" tiles
cmap="viridis", # use "Set1" matplotlib colormap
legend=True,
marker_kwds=dict(radius=5,icon=folium.Icon(icon='house-blank')),
style_kwds=dict(color="black"), # use black outline
zoom_control=True,
#zoom_start = 1,
)
Task 2.4 Subjectivity Analysis¶
Calculate the subjectivity values of all the tweets. For a given geographical location, if you have more than one tweet then find the average subjectivity value taking into consideration all the tweets generated from the same location. Using a suitable plot type (such as a geographical map), perform a geospatial visualisation of the subjectivities corresponding to all the tweets. Whilst you are free to choose a plot type, the visualisation must be clear and easy to understand/interpret.
Define a function that extracts subjectivity values from the tweets(text)
def getTextSubjectivity(txt):
return TextBlob(txt).sentiment.subjectivity
Apply the defined function and inspect
bitcoin_tweet['subjectivity'] = bitcoin_tweet['text'].apply(getTextSubjectivity)
bitcoin_tweet
field_1 | user_location | text | user_followers | locations | latitude | longitude | geometry | polarity | polarity_label | subjectivity | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 714437 | The Moon, Sylacauga, Talladega County, Alabama... | talked to about stacks stx and hopes the... | 39 | The Moon | 33.1584497 | -86.2385846 | POINT (-86.23858 33.15845) | -0.125000 | Negative | 0.375000 |
1 | 280261 | Lisboa, Portugal | i m so f tired of ignorant american phd econom... | 1614 | Lisbon, Portugal | 38.7077507 | -9.1365919 | POINT (-9.13659 38.70775) | -0.133333 | Negative | 0.233333 |
2 | 336653 | Muhu, Saare maakond, Eesti | long bitcoin | 43 | Moon | 58.5959044 | 23.21964608602439 | POINT (23.21965 58.59590) | -0.050000 | Negative | 0.400000 |
3 | 872349 | Global, 81, Barber Greene Road, Don Mills, Don... | bitcoin bull cathie wood attracts big short... | 77240 | Global | 43.7283874 | -79.34914879325001 | POINT (-79.34915 43.72839) | 0.000000 | Neutral | 0.200000 |
4 | 109405 | Canada | bitcoin price in us dollar btc usd btcusd ... | 250 | Canada | 61.0666922 | -107.991707 | POINT (-107.99171 61.06669) | 0.165000 | Positive | 0.351667 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
495 | 898076 | Hyderabad, Bahadurpura mandal, Hyderabad Distr... | freedom 35 official bsc bitcoin via | 112 | Hyderabad, India | 17.360589 | 78.4740613 | POINT (78.47406 17.36059) | 0.000000 | Neutral | 0.000000 |
496 | 951645 | b'424c4f434b434841494e2c20d09dd0b0d0b1d0b5d180... | bitcoin btc i don t like these lower highs... | 21254 | #blockchain | 44.6465984 | 34.4007341 | POINT (34.40073 44.64660) | 0.000000 | Neutral | 0.000000 |
497 | 7974 | Sydney, Council of the City of Sydney, New Sou... | what goes up must comes down hard question ... | 26 | Sydney, New South Wales | -33.8698439 | 151.2082848 | POINT (151.20828 -33.86984) | -0.223611 | Negative | 0.415278 |
498 | 788995 | Montréal, Agglomération de Montréal, MontrÃ... | am i the only one that likes watching the sats... | 135 | Montréal, Québec | 45.5031824 | -73.5698065 | POINT (-73.56981 45.50318) | 0.250000 | Positive | 0.750000 |
499 | 62652 | Laguna Beach, Orange County, California, Unite... | lamb is a fast safe and scalable blockchai... | 3784 | Laguna Beach, CA | 33.5426975 | -117.785366 | POINT (-117.78537 33.54270) | 0.350000 | Positive | 0.550000 |
500 rows × 11 columns
Define a function a that labels the subjective values, (Oluyale, 2023).
def definesubjectivity(x):
if x > 0.5:
return "subjective"
elif x < 0.5:
return "Factual"
elif x == 0.5:
return "Neutral"
bitcoin_tweet #inspect
field_1 | user_location | text | user_followers | locations | latitude | longitude | geometry | polarity | polarity_label | subjectivity | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 714437 | The Moon, Sylacauga, Talladega County, Alabama... | talked to about stacks stx and hopes the... | 39 | The Moon | 33.1584497 | -86.2385846 | POINT (-86.23858 33.15845) | -0.125000 | Negative | 0.375000 |
1 | 280261 | Lisboa, Portugal | i m so f tired of ignorant american phd econom... | 1614 | Lisbon, Portugal | 38.7077507 | -9.1365919 | POINT (-9.13659 38.70775) | -0.133333 | Negative | 0.233333 |
2 | 336653 | Muhu, Saare maakond, Eesti | long bitcoin | 43 | Moon | 58.5959044 | 23.21964608602439 | POINT (23.21965 58.59590) | -0.050000 | Negative | 0.400000 |
3 | 872349 | Global, 81, Barber Greene Road, Don Mills, Don... | bitcoin bull cathie wood attracts big short... | 77240 | Global | 43.7283874 | -79.34914879325001 | POINT (-79.34915 43.72839) | 0.000000 | Neutral | 0.200000 |
4 | 109405 | Canada | bitcoin price in us dollar btc usd btcusd ... | 250 | Canada | 61.0666922 | -107.991707 | POINT (-107.99171 61.06669) | 0.165000 | Positive | 0.351667 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
495 | 898076 | Hyderabad, Bahadurpura mandal, Hyderabad Distr... | freedom 35 official bsc bitcoin via | 112 | Hyderabad, India | 17.360589 | 78.4740613 | POINT (78.47406 17.36059) | 0.000000 | Neutral | 0.000000 |
496 | 951645 | b'424c4f434b434841494e2c20d09dd0b0d0b1d0b5d180... | bitcoin btc i don t like these lower highs... | 21254 | #blockchain | 44.6465984 | 34.4007341 | POINT (34.40073 44.64660) | 0.000000 | Neutral | 0.000000 |
497 | 7974 | Sydney, Council of the City of Sydney, New Sou... | what goes up must comes down hard question ... | 26 | Sydney, New South Wales | -33.8698439 | 151.2082848 | POINT (151.20828 -33.86984) | -0.223611 | Negative | 0.415278 |
498 | 788995 | Montréal, Agglomération de Montréal, MontrÃ... | am i the only one that likes watching the sats... | 135 | Montréal, Québec | 45.5031824 | -73.5698065 | POINT (-73.56981 45.50318) | 0.250000 | Positive | 0.750000 |
499 | 62652 | Laguna Beach, Orange County, California, Unite... | lamb is a fast safe and scalable blockchai... | 3784 | Laguna Beach, CA | 33.5426975 | -117.785366 | POINT (-117.78537 33.54270) | 0.350000 | Positive | 0.550000 |
500 rows × 11 columns
Apply the defined function
bitcoin_tweet["subjectivity_label"] = bitcoin_tweet["subjectivity"].apply(definesubjectivity)
Plot a histogram showing subjectivity values count
sns.histplot(x='subjectivity', data= bitcoin_tweet, color='orange')
<Axes: xlabel='subjectivity', ylabel='Count'>
Do a count plot of subjectivity label
plt.figure(figsize=(12,4))
sns.countplot(x='subjectivity_label', data=bitcoin_tweet)
plt.title('Subjectivity Label Count')
plt.show()
bitcoin_tweet.columns #inspect available columns
Index(['field_1', 'user_location', 'text', 'user_followers', 'locations', 'latitude', 'longitude', 'geometry', 'polarity', 'polarity_label', 'subjectivity', 'subjectivity_label'], dtype='object')
Choose columns necessary for map visualization
bitcoin_tweet3= bitcoin_tweet[['geometry', 'subjectivity', 'subjectivity_label']]
Map Visualisation of Subjectivity¶
Plotting subjectivity values
fig, ax = plt.subplots(figsize=(20,16))
earth.plot(ax=ax, color= "grey")
bitcoin_tweet3.plot(ax=ax, column='subjectivity', legend=True, legend_kwds={"label": 'Subjectivity Spread of Bitcoin Tweets', "orientation":"horizontal"}, cmap='Set1')
plt.show()
Plotting subjectivity values with interactive map using geopandas .explore()
bitcoin_tweet3.explore(column='subjectivity', # make choropleth based on "BoroName" column
tooltip=['subjectivity'], # show "BoroName" value in tooltip (on hover)
popup=True, # show all values in popup (on click)
tiles="openstreetmap", # use "CartoDB positron" tiles
cmap="Set1", # use "Set1" matplotlib colormap
legend=True,
marker_kwds=dict(radius=5,icon=folium.Icon(icon='house-blank')),
style_kwds=dict(color="black"), # use black outline
zoom_control=True,
#zoom_start = 1,
)
Plotting subjectivity label
fig, ax = plt.subplots(figsize=(20,16))
earth.plot(ax=ax, color= "grey")
bitcoin_tweet3.plot(ax=ax, column='subjectivity_label', legend=True, cmap='viridis')
plt.show()
Plotting subjectivity label with interactive map using geopandas .explore()
bitcoin_tweet3.explore(column='subjectivity_label', # make choropleth based on "BoroName" column
tooltip=['subjectivity_label'], # show "BoroName" value in tooltip (on hover)
popup=True, # show all values in popup (on click)
tiles="openstreetmap", # use "CartoDB positron" tiles
cmap="Set1", # use "Set1" matplotlib colormap
legend=True,
marker_kwds=dict(radius=5,icon=folium.Icon(icon='house-blank')),
style_kwds=dict(color="black"), # use black outline
zoom_control=True,
#zoom_start = 1,
)
Reading a GeoDataFrame from a DataFrame with coordinates
A GeoDataFrame needs a shapely object. We use geopandas points_from_xy() to transform Longitude and Latitude into a list of shapely.Point objects and set it as a geometry while creating the GeoDataFrame. (note that points_from_xy() is an enhanced wrapper for [Point(x, y) for x, y in zip(df.Longitude, df.Latitude)]). The crs value is also set to explicitly state the geometry data defines latitude/ longitude world geodetic degree values. This is important for the correct interpretation of the data, such as when plotting with data in other formats.
Now we interpret the sentiment based on the compound score. If the compound score is greater than or equal to 0.05, it's considered positive. If it's less than or equal to -0.05, it's considered negative. Otherwise, it's considered neutral. (Oluyale, 2023)
Oluyale, D. (2023, September 16). Sentiment Analysis using various options in Python Machine Learning. Medium. https://medium.com/@oluyaled/sentiment-analysis-using-various-options-in-python-machine-learning-aaa24ea0991c
Task 2.5 Storify/Interpretation¶
In this task, use your geospatial data analytical skill to storify (in not more than 500 words) the results obtained in the preceding two tasks. Imagine yourself as a policy advisor to the UK government whose job is to update about the public sentiment related to cryptocurrency across different parts of the world. You may try to answer some of these example questions – How is the public opinion about cryptocurrency? Which locations have positive views about this issue and where can you see a vast amount of negativity? Despite having positive/negative/mixed sentiment about cryptocurrency, will you take these tweets very seriously (HINT: if the tweet originates from outside the UK, then it may not affect the government policies!)? Are the messages loud and clear? Please note that these are only suggestive questions. You are strongly recommended to not constrain your sentiment analytical skills
Storification task would require more codes and visualization
print(bitcoin_tweet['locations'].value_counts().to_markdown())
| locations | count | |:----------------------------------------------------|--------:| | Bay Area, CA | 19 | | London, England | 15 | | United States | 13 | | Global | 10 | | New York, NY | 10 | | Australia | 8 | | Matter Doesn't Matter | 6 | | Moon | 6 | | New York | 6 | | Earth | 6 | | The Moon | 5 | | England, United Kingdom | 5 | | United Kingdom | 5 | | Blockchain | 5 | | India | 5 | | Birmingham, England | 4 | | Worldwide | 4 | | Europe | 4 | | California, USA | 4 | | Florida, USA | 4 | | New York, USA | 4 | | Mars | 4 | | Trade here 👉 | 4 | | Singapore | 4 | | UK | 4 | | Pakistan | 3 | | Miami, FL | 3 | | Chicago, IL | 3 | | Paris, FR | 3 | | Canada | 3 | | Bitcoin | 3 | | London | 3 | | Islamabad, Pakistan | 3 | | USA | 3 | | World | 3 | | Kolkata, India | 2 | | Sankt-Peterburg | 2 | | Türkiye | 2 | | New York City | 2 | | San Diego, CA | 2 | | London, UK | 2 | | To the Moon | 2 | | Bangladesh | 2 | | Internet | 2 | | Houston, TX | 2 | | Paris, France | 2 | | Tennessee | 2 | | Austin | 2 | | tesvikiye | 2 | | America | 2 | | Bellevue, WA | 2 | | Spain | 2 | | Buffalo, NY | 2 | | São Paulo, Brasil 🇧🇷 | 2 | | London, England 🇬🇧 | 2 | | Dhaka, Bangladesh | 2 | | Minnesota | 2 | | on an island 🇨🇦 | 2 | | Manhattan, NY | 2 | | Rotterdam, Nederland | 2 | | Nova Scotia, Canada | 2 | | Everywhere | 2 | | Hong Kong | 2 | | South Africa | 2 | | Crypto World | 2 | | Johannesburg, South Africa | 2 | | Jersey City, New Jersey | 2 | | Sydney, New South Wales | 2 | | Toronto, Ontario | 2 | | Estados Unidos | 2 | | Kansas City, MO | 2 | | somewhere | 2 | | Philadelphia, PA | 2 | | Metaverse | 1 | | 👇YouTube Channel👇 | 1 | | Paris | 1 | | Parts Unknown | 1 | | à¤à¤¾à¤°à¤¤ | 1 | | 127.0.0.1 | 1 | | Islamic Republic of Iran | 1 | | Metz / Paris, FRANCE | 1 | | United States Of American | 1 | | El Salvador | 1 | | Penzance, Cornwall | 1 | | Western Europe | 1 | | east Godavari, Andhra Pradesh | 1 | | New Delhi, India | 1 | | Cambridge, MA | 1 | | Galt’s Gulch | 1 | | Orange County, California | 1 | | Near You | 1 | | Ukraine | 1 | | Milan, Lombardy | 1 | | Cuddapah, India | 1 | | Bali, Indonesia | 1 | | DE 🇩🇪 & SA🇿🇦 | 1 | | Santa Monica, CA | 1 | | SPORTS BETTING (THE CASINO) | 1 | | Grindelwald, Schweiz | 1 | | Jawa Tengah, Indonesia | 1 | | Traveler.. | 1 | | Texas, USA | 1 | | Raigarh, India | 1 | | New York, NC | 1 | | U.S.A! | 1 | | Northern California | 1 | | b'd0bad0b0d0b7d0b0d185d181d182d0b0d0bd' | 1 | | Lausanne (Switzerland) | 1 | | Worldwide | 1 | | Nouvelle calédonie | 1 | | DKI Jakarta, Indonesia | 1 | | Copenhagen | 1 | | Bandung, Jawa Barat | 1 | | Los Angeles | 1 | | Morocco | 1 | | Russia | 1 | | Wayne, PA | 1 | | Dublin, Ireland | 1 | | Loyalsock Township, PA | 1 | | Chile | 1 | | EARTH | 1 | | Brisbane | 1 | | East Midlands, England | 1 | | Brookline, MA | 1 | | Faridpur, Bangladesh | 1 | | Ciudad Real, Spain | 1 | | Washington, DC | 1 | | INDONESIA | 1 | | Europa | 1 | | Massachusetts | 1 | | Toronto | 1 | | Uranus | 1 | | South Korea | 1 | | Down the Rabbit Hole | 1 | | Moon, PA | 1 | | Glasgow, Scotland | 1 | | London, United Kingdom | 1 | | Sukabumi, Indonesia | 1 | | Anaheim, CA | 1 | | The Netherlands | 1 | | Jaipur Rajasthan | 1 | | Seattle, WA | 1 | | Detroit Michigan | 1 | | Cincinnati, OH | 1 | | Space Mountain | 1 | | Melbourne, Victoria | 1 | | Telegram | 1 | | Road Warrior | 1 | | ¯\_(ツ)_/¯ | 1 | | Germany | 1 | | Ann Arbor | 1 | | Everywhere 🗺 | 1 | | Tamil Nadu | 1 | | Hyderabad, India | 1 | | #blockchain | 1 | | Montréal, Québec | 1 | | British Columbia, Canada | 1 | | Florida | 1 | | Sumatera Selatan, Indonesia | 1 | | Columbia, SC | 1 | | Little Rock, AR | 1 | | Istanbul, Turkey | 1 | | Tucson, AZ | 1 | | South Park, CO | 1 | | Los Angeles, CA | 1 | | 420 Wall St, NY | 1 | | Spratly Islands | 1 | | Malang, Jawa Timur | 1 | | Ondo, Nigeria | 1 | | Texas | 1 | | Curaçao | 1 | | Nigeria | 1 | | Oslo, Norway | 1 | | Irving, Tx | 1 | | World Wide | 1 | | earth | 1 | | Tegal | 1 | | Santa Fe, New Mexico | 1 | | Sydney, Australia | 1 | | Bangkok, Thailand | 1 | | Jakarta, Indonesia | 1 | | Fairfax, VA | 1 | | Strazburg, France | 1 | | Los Angeles, California, USA | 1 | | in search | 1 | | bella ciao | 1 | | Barcelona / Bangkok | 1 | | International Space Station 🚀 | 1 | | Larissa, GR | 1 | | South East, England | 1 | | Switzerland | 1 | | Maui, Hawaii | 1 | | Shanghai, China | 1 | | Milan, Italy | 1 | | Oklahoma City, OK | 1 | | Busan, Republic of Korea | 1 | | West Dhanmondi, Dhaka | 1 | | Bumi Nusantara | 1 | | Austin, TX | 1 | | Brussel, België | 1 | | Dallas, TX | 1 | | Kyiv, Ukraine | 1 | | Rangpur, Bangladesh | 1 | | Glasgow, UK | 1 | | Zug | Berlin | 1 | | Landgraaf, Nederland | 1 | | here | 1 | | Argentina | 1 | | Washington D.C | 1 | | Shambhala | 1 | | Planet Earth | 1 | | American Fork, UT | 1 | | Salt Lake City, Utah | 1 | | Tehran | 1 | | Garmany | 1 | | Utah | 1 | | Internationalist | 1 | | Moon | 1 | | ëŒ€í•œë¯¼êµ ì•ˆì‚°ì‹œ | 1 | | Pekanbaru, Riau | 1 | | united states | 1 | | Alger | 1 | | Global | 1 | | Meta | 1 | | Mother Earth | 1 | | Punjab, Pakistan | 1 | | Bareilly, India | 1 | | @Moon | 1 | | Turkey | 1 | | Boston, MA | 1 | | California, USA🇺🇸 | 1 | | San Tan Valley, AZ | 1 | | الولايات Ø§Ù„Ù…ØªØØ¯Ø© الأمريكية | 1 | | Samsun, Türkiye | 1 | | Future | 1 | | San Mateo, CA | 1 | | Westland, MI | 1 | | Miami Florida | 1 | | Mewn | 1 | | Guess | 1 | | Burger-galaxy | 1 | | Ireland | 1 | | Pace, FL | 1 | | Watford, Hertfordshire, UK | 1 | | Malaysia | 1 | | Rio de Janeiro, Brazil | 1 | | Lewes, DE | 1 | | Between Here & There | 1 | | Sydney | 1 | | Bay Area, California | 1 | | Mumbai | 1 | | Dubai, United Arab Emirates | 1 | | Gazipur, Dhaka | 1 | | Kansas City | 1 | | Tucuman, Argentina | 1 | | Ohio, USA | 1 | | België | 1 | | Washington state | 1 | | Jackson,Mississippi,USA | 1 | | Antwerp | 1 | | Englewood Cliffs, NJ | 1 | | India | 1 | | West Bengal, India | 1 | | İstanbul, Türkiye | 1 | | Mumbai, India | 1 | | Ahmedabad City, India | 1 | | #الكرة_الارضية | 1 | | Amsterdam, Nederland | 1 | | Wellington City, New Zealand | 1 | | Ankara, Türkiye | 1 | | South West, England | 1 | | halp, USA | 1 | | Durham, NC | 1 | | Texas | 1 | | Greenville, SC | 1 | | india | 1 | | Wyoming | 1 | | AZ/NH/MA | 1 | | THE MOON | 1 | | Out and About | 1 | | Space | 1 | | City of London, London | 1 | | Windsor, Ontario | 1 | | b'536f7574682041667269636120f09f87bff09f87a6e29da4' | 1 | | Hamburg, Deutschland | 1 | | Nova Friburgo, Brasil | 1 | | Gas Fields, Louisiana | 1 | | now | 1 | | Lagos | 1 | | New Orleans | 1 | | Jaderberg | 1 | | Chattogram , Bangladesh | 1 | | Michigan, USA | 1 | | 3rd rock from the sun | 1 | | Islington, London | 1 | | Bogor | 1 | | Fort Worth, Texas | 1 | | New York & Taipei | 1 | | Venezuela | 1 | | Indiana, USA | 1 | | Sri lanka | 1 | | Lisbon, Portugal | 1 | | Laguna Beach, CA | 1 |
bitcoin_tweet[:3]
field_1 | user_location | text | user_followers | locations | latitude | longitude | geometry | polarity | polarity_label | subjectivity | subjectivity_label | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 714437 | The Moon, Sylacauga, Talladega County, Alabama... | talked to about stacks stx and hopes the... | 39 | The Moon | 33.1584497 | -86.2385846 | POINT (-86.23858 33.15845) | -0.125000 | Negative | 0.375000 | Factual |
1 | 280261 | Lisboa, Portugal | i m so f tired of ignorant american phd econom... | 1614 | Lisbon, Portugal | 38.7077507 | -9.1365919 | POINT (-9.13659 38.70775) | -0.133333 | Negative | 0.233333 | Factual |
2 | 336653 | Muhu, Saare maakond, Eesti | long bitcoin | 43 | Moon | 58.5959044 | 23.21964608602439 | POINT (23.21965 58.59590) | -0.050000 | Negative | 0.400000 | Factual |
bri= bitcoin_tweet[(bitcoin_tweet["locations"]== 'London, England')]
bri
field_1 | user_location | text | user_followers | locations | latitude | longitude | geometry | polarity | polarity_label | subjectivity | subjectivity_label | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
20 | 419791 | London, Greater London, England, United Kingdom | thanks for giving us such a great opportunit... | 129 | London, England | 51.5074456 | -0.1277653 | POINT (-0.12777 51.50745) | 3.100000e-01 | Positive | 0.340000 | Factual |
54 | 295173 | London, Greater London, England, United Kingdom | bitcoin fear index on google search statistics... | 202 | London, England | 51.5074456 | -0.1277653 | POINT (-0.12777 51.50745) | 3.750000e-01 | Positive | 0.500000 | Neutral |
103 | 820864 | London, Greater London, England, United Kingdom | no trade btc price is 30888 at time 08 07 21... | 1523 | London, England | 51.5074456 | -0.1277653 | POINT (-0.12777 51.50745) | 0.000000e+00 | Neutral | 0.000000 | Factual |
112 | 505680 | London, Greater London, England, United Kingdom | grayscale pairs with coindesk index to launch ... | 62 | London, England | 51.5074456 | -0.1277653 | POINT (-0.12777 51.50745) | 0.000000e+00 | Neutral | 0.000000 | Factual |
117 | 271016 | London, Greater London, England, United Kingdom | it was good while it lasted now it is time ... | 129 | London, England | 51.5074456 | -0.1277653 | POINT (-0.12777 51.50745) | 3.500000e-01 | Positive | 0.300000 | Factual |
234 | 513485 | London, Greater London, England, United Kingdom | grayscale investments launches defi fund now... | 62 | London, England | 51.5074456 | -0.1277653 | POINT (-0.12777 51.50745) | 1.000000e-01 | Positive | 0.000000 | Factual |
264 | 520206 | London, Greater London, England, United Kingdom | ready for elon to say some stupid shit trigger... | 63 | London, England | 51.5074456 | -0.1277653 | POINT (-0.12777 51.50745) | -2.666667e-01 | Negative | 0.766667 | subjective |
322 | 824028 | London, Greater London, England, United Kingdom | let the fifth and final wave commence btc bi... | 7 | London, England | 51.5074456 | -0.1277653 | POINT (-0.12777 51.50745) | 0.000000e+00 | Neutral | 1.000000 | subjective |
362 | 568086 | London, Greater London, England, United Kingdom | all details on telegram channel entry tar... | 119 | London, England | 51.5074456 | -0.1277653 | POINT (-0.12777 51.50745) | 0.000000e+00 | Neutral | 0.000000 | Factual |
377 | 663473 | London, Greater London, England, United Kingdom | the first ever mode post agm investor present... | 788 | London, England | 51.5074456 | -0.1277653 | POINT (-0.12777 51.50745) | 3.250000e-01 | Positive | 0.366667 | Factual |
378 | 272275 | London, Greater London, England, United Kingdom | open spot signals 10 ada rlc ... | 78 | London, England | 51.5074456 | -0.1277653 | POINT (-0.12777 51.50745) | 4.000000e-01 | Positive | 0.450000 | Factual |
386 | 578083 | London, Greater London, England, United Kingdom | the fiat money experiment is failing bitcoin | 18 | London, England | 51.5074456 | -0.1277653 | POINT (-0.12777 51.50745) | 0.000000e+00 | Neutral | 0.000000 | Factual |
398 | 624101 | London, Greater London, England, United Kingdom | huge bitcoin inflow to gemini behind the drop ... | 8249 | London, England | 51.5074456 | -0.1277653 | POINT (-0.12777 51.50745) | 2.775558e-17 | Positive | 0.800000 | subjective |
436 | 558763 | London, Greater London, England, United Kingdom | we don t buy bitcoin due to the new unique... | 851 | London, England | 51.5074456 | -0.1277653 | POINT (-0.12777 51.50745) | 2.385281e-01 | Positive | 0.594697 | subjective |
466 | 130805 | London, Greater London, England, United Kingdom | elsalvador becomes the first country to make ... | 469 | London, England | 51.5074456 | -0.1277653 | POINT (-0.12777 51.50745) | 2.250000e-01 | Positive | 0.266667 | Factual |
bri2 = bitcoin_tweet[(bitcoin_tweet["locations"]== 'United Kingdom')]
bri2
field_1 | user_location | text | user_followers | locations | latitude | longitude | geometry | polarity | polarity_label | subjectivity | subjectivity_label | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
127 | 349546 | United Kingdom | what would the government do if the youth of t... | 261 | United Kingdom | 54.7023545 | -3.2765753 | POINT (-3.27658 54.70235) | 0.0 | Neutral | 0.0 | Factual |
137 | 531787 | United Kingdom | swedish man sentenced for gold backed cryptocu... | 173 | United Kingdom | 54.7023545 | -3.2765753 | POINT (-3.27658 54.70235) | 0.0 | Neutral | 0.0 | Factual |
376 | 498148 | United Kingdom | privacy focused crypto is launching an inc... | 5840 | United Kingdom | 54.7023545 | -3.2765753 | POINT (-3.27658 54.70235) | 0.0 | Neutral | 0.0 | Factual |
411 | 556330 | United Kingdom | it s all profit cryptocurrency made me a milli... | 2332 | United Kingdom | 54.7023545 | -3.2765753 | POINT (-3.27658 54.70235) | 0.0 | Neutral | 0.0 | Factual |
460 | 412787 | United Kingdom | buy bitcoin and hold newhigh | 12 | United Kingdom | 54.7023545 | -3.2765753 | POINT (-3.27658 54.70235) | 0.0 | Neutral | 0.0 | Factual |
br3 = bitcoin_tweet[(bitcoin_tweet["locations"]== 'London')]
br3
field_1 | user_location | text | user_followers | locations | latitude | longitude | geometry | polarity | polarity_label | subjectivity | subjectivity_label | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
36 | 891204 | London, Greater London, England, United Kingdom | good traders vs bad traders a bad trade on th... | 1463 | London | 51.5074456 | -0.1277653 | POINT (-0.12777 51.50745) | -0.217857 | Negative | 0.539286 | subjective |
256 | 157176 | London, Greater London, England, United Kingdom | you still tweeting about bitcoin wait u... | 43 | London | 51.5074456 | -0.1277653 | POINT (-0.12777 51.50745) | 0.142857 | Positive | 0.267857 | Factual |
475 | 367564 | London, Greater London, England, United Kingdom | so after a very long wait tomorrow is the sta... | 76 | London | 51.5074456 | -0.1277653 | POINT (-0.12777 51.50745) | 0.267500 | Positive | 0.710000 | subjective |
br4 = bitcoin_tweet[(bitcoin_tweet["locations"]== 'London, England 🇬🇧')]
br4
field_1 | user_location | text | user_followers | locations | latitude | longitude | geometry | polarity | polarity_label | subjectivity | subjectivity_label | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
76 | 663255 | London, Greater London, England, United Kingdom | i m encouraged by this morning s senate bankin... | 4285 | London, England 🇬🇧 | 51.5074456 | -0.1277653 | POINT (-0.12777 51.50745) | 0.0 | Neutral | 0.0 | Factual |
433 | 438568 | London, Greater London, England, United Kingdom | ta bitcoin trim losses why bulls need to ove... | 4299 | London, England 🇬🇧 | 51.5074456 | -0.1277653 | POINT (-0.12777 51.50745) | 0.5 | Positive | 0.5 | Neutral |
br5 = bitcoin_tweet[(bitcoin_tweet["locations"]== 'England, United Kingdom')]
br5
field_1 | user_location | text | user_followers | locations | latitude | longitude | geometry | polarity | polarity_label | subjectivity | subjectivity_label | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
123 | 890689 | England, United Kingdom | live bitcoin price 46 731 an increase ... | 39938 | England, United Kingdom | 52.5310214 | -1.2649062 | POINT (-1.26491 52.53102) | 0.068182 | Positive | 0.283333 | Factual |
159 | 739840 | England, United Kingdom | fair launching today 6pm utc do not miss... | 320 | England, United Kingdom | 52.5310214 | -1.2649062 | POINT (-1.26491 52.53102) | 0.291667 | Positive | 0.550000 | subjective |
196 | 228331 | England, United Kingdom | bitcoin wow some people woke up thinking thi... | 7 | England, United Kingdom | 52.5310214 | -1.2649062 | POINT (-1.26491 52.53102) | 0.141667 | Positive | 0.741667 | subjective |
423 | 293677 | England, United Kingdom | live bitcoin price 35 494 an increase ... | 39946 | England, United Kingdom | 52.5310214 | -1.2649062 | POINT (-1.26491 52.53102) | 0.068182 | Positive | 0.283333 | Factual |
487 | 423690 | England, United Kingdom | moving up slowly inertia creeps moving up slo... | 5 | England, United Kingdom | 52.5310214 | -1.2649062 | POINT (-1.26491 52.53102) | -0.300000 | Negative | 0.400000 | Factual |
Concatenate the dataframes on rows and call it bitcoin_tweetUK
bitcoin_tweetUK = pd.concat([bri, bri2, br3, br4, br5], axis=0)
bitcoin_tweetUK
field_1 | user_location | text | user_followers | locations | latitude | longitude | geometry | polarity | polarity_label | subjectivity | subjectivity_label | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
20 | 419791 | London, Greater London, England, United Kingdom | thanks for giving us such a great opportunit... | 129 | London, England | 51.5074456 | -0.1277653 | POINT (-0.12777 51.50745) | 3.100000e-01 | Positive | 0.340000 | Factual |
54 | 295173 | London, Greater London, England, United Kingdom | bitcoin fear index on google search statistics... | 202 | London, England | 51.5074456 | -0.1277653 | POINT (-0.12777 51.50745) | 3.750000e-01 | Positive | 0.500000 | Neutral |
103 | 820864 | London, Greater London, England, United Kingdom | no trade btc price is 30888 at time 08 07 21... | 1523 | London, England | 51.5074456 | -0.1277653 | POINT (-0.12777 51.50745) | 0.000000e+00 | Neutral | 0.000000 | Factual |
112 | 505680 | London, Greater London, England, United Kingdom | grayscale pairs with coindesk index to launch ... | 62 | London, England | 51.5074456 | -0.1277653 | POINT (-0.12777 51.50745) | 0.000000e+00 | Neutral | 0.000000 | Factual |
117 | 271016 | London, Greater London, England, United Kingdom | it was good while it lasted now it is time ... | 129 | London, England | 51.5074456 | -0.1277653 | POINT (-0.12777 51.50745) | 3.500000e-01 | Positive | 0.300000 | Factual |
234 | 513485 | London, Greater London, England, United Kingdom | grayscale investments launches defi fund now... | 62 | London, England | 51.5074456 | -0.1277653 | POINT (-0.12777 51.50745) | 1.000000e-01 | Positive | 0.000000 | Factual |
264 | 520206 | London, Greater London, England, United Kingdom | ready for elon to say some stupid shit trigger... | 63 | London, England | 51.5074456 | -0.1277653 | POINT (-0.12777 51.50745) | -2.666667e-01 | Negative | 0.766667 | subjective |
322 | 824028 | London, Greater London, England, United Kingdom | let the fifth and final wave commence btc bi... | 7 | London, England | 51.5074456 | -0.1277653 | POINT (-0.12777 51.50745) | 0.000000e+00 | Neutral | 1.000000 | subjective |
362 | 568086 | London, Greater London, England, United Kingdom | all details on telegram channel entry tar... | 119 | London, England | 51.5074456 | -0.1277653 | POINT (-0.12777 51.50745) | 0.000000e+00 | Neutral | 0.000000 | Factual |
377 | 663473 | London, Greater London, England, United Kingdom | the first ever mode post agm investor present... | 788 | London, England | 51.5074456 | -0.1277653 | POINT (-0.12777 51.50745) | 3.250000e-01 | Positive | 0.366667 | Factual |
378 | 272275 | London, Greater London, England, United Kingdom | open spot signals 10 ada rlc ... | 78 | London, England | 51.5074456 | -0.1277653 | POINT (-0.12777 51.50745) | 4.000000e-01 | Positive | 0.450000 | Factual |
386 | 578083 | London, Greater London, England, United Kingdom | the fiat money experiment is failing bitcoin | 18 | London, England | 51.5074456 | -0.1277653 | POINT (-0.12777 51.50745) | 0.000000e+00 | Neutral | 0.000000 | Factual |
398 | 624101 | London, Greater London, England, United Kingdom | huge bitcoin inflow to gemini behind the drop ... | 8249 | London, England | 51.5074456 | -0.1277653 | POINT (-0.12777 51.50745) | 2.775558e-17 | Positive | 0.800000 | subjective |
436 | 558763 | London, Greater London, England, United Kingdom | we don t buy bitcoin due to the new unique... | 851 | London, England | 51.5074456 | -0.1277653 | POINT (-0.12777 51.50745) | 2.385281e-01 | Positive | 0.594697 | subjective |
466 | 130805 | London, Greater London, England, United Kingdom | elsalvador becomes the first country to make ... | 469 | London, England | 51.5074456 | -0.1277653 | POINT (-0.12777 51.50745) | 2.250000e-01 | Positive | 0.266667 | Factual |
127 | 349546 | United Kingdom | what would the government do if the youth of t... | 261 | United Kingdom | 54.7023545 | -3.2765753 | POINT (-3.27658 54.70235) | 0.000000e+00 | Neutral | 0.000000 | Factual |
137 | 531787 | United Kingdom | swedish man sentenced for gold backed cryptocu... | 173 | United Kingdom | 54.7023545 | -3.2765753 | POINT (-3.27658 54.70235) | 0.000000e+00 | Neutral | 0.000000 | Factual |
376 | 498148 | United Kingdom | privacy focused crypto is launching an inc... | 5840 | United Kingdom | 54.7023545 | -3.2765753 | POINT (-3.27658 54.70235) | 0.000000e+00 | Neutral | 0.000000 | Factual |
411 | 556330 | United Kingdom | it s all profit cryptocurrency made me a milli... | 2332 | United Kingdom | 54.7023545 | -3.2765753 | POINT (-3.27658 54.70235) | 0.000000e+00 | Neutral | 0.000000 | Factual |
460 | 412787 | United Kingdom | buy bitcoin and hold newhigh | 12 | United Kingdom | 54.7023545 | -3.2765753 | POINT (-3.27658 54.70235) | 0.000000e+00 | Neutral | 0.000000 | Factual |
36 | 891204 | London, Greater London, England, United Kingdom | good traders vs bad traders a bad trade on th... | 1463 | London | 51.5074456 | -0.1277653 | POINT (-0.12777 51.50745) | -2.178571e-01 | Negative | 0.539286 | subjective |
256 | 157176 | London, Greater London, England, United Kingdom | you still tweeting about bitcoin wait u... | 43 | London | 51.5074456 | -0.1277653 | POINT (-0.12777 51.50745) | 1.428571e-01 | Positive | 0.267857 | Factual |
475 | 367564 | London, Greater London, England, United Kingdom | so after a very long wait tomorrow is the sta... | 76 | London | 51.5074456 | -0.1277653 | POINT (-0.12777 51.50745) | 2.675000e-01 | Positive | 0.710000 | subjective |
76 | 663255 | London, Greater London, England, United Kingdom | i m encouraged by this morning s senate bankin... | 4285 | London, England 🇬🇧 | 51.5074456 | -0.1277653 | POINT (-0.12777 51.50745) | 0.000000e+00 | Neutral | 0.000000 | Factual |
433 | 438568 | London, Greater London, England, United Kingdom | ta bitcoin trim losses why bulls need to ove... | 4299 | London, England 🇬🇧 | 51.5074456 | -0.1277653 | POINT (-0.12777 51.50745) | 5.000000e-01 | Positive | 0.500000 | Neutral |
123 | 890689 | England, United Kingdom | live bitcoin price 46 731 an increase ... | 39938 | England, United Kingdom | 52.5310214 | -1.2649062 | POINT (-1.26491 52.53102) | 6.818182e-02 | Positive | 0.283333 | Factual |
159 | 739840 | England, United Kingdom | fair launching today 6pm utc do not miss... | 320 | England, United Kingdom | 52.5310214 | -1.2649062 | POINT (-1.26491 52.53102) | 2.916667e-01 | Positive | 0.550000 | subjective |
196 | 228331 | England, United Kingdom | bitcoin wow some people woke up thinking thi... | 7 | England, United Kingdom | 52.5310214 | -1.2649062 | POINT (-1.26491 52.53102) | 1.416667e-01 | Positive | 0.741667 | subjective |
423 | 293677 | England, United Kingdom | live bitcoin price 35 494 an increase ... | 39946 | England, United Kingdom | 52.5310214 | -1.2649062 | POINT (-1.26491 52.53102) | 6.818182e-02 | Positive | 0.283333 | Factual |
487 | 423690 | England, United Kingdom | moving up slowly inertia creeps moving up slo... | 5 | England, United Kingdom | 52.5310214 | -1.2649062 | POINT (-1.26491 52.53102) | -3.000000e-01 | Negative | 0.400000 | Factual |
bitcoin_tweetUK.shape
(30, 12)
I will get the polarity Label Count with respect to UK
plt.figure(figsize=(12,4))
sns.countplot(x='polarity_label', data=bitcoin_tweetUK)
plt.title('Polarity Label Count for UK')
plt.show()
Subjectivity Count for UK
plt.figure(figsize=(12,4))
sns.countplot(x='subjectivity_label', data=bitcoin_tweetUK)
plt.title('Subjectivity Label Count for UK')
plt.show()
Storify/Interpretation¶
Introduction
Bitcoin is part of a peer-to-peer network called cryptocurrency. Users can exchange value digitally without the intervention of a third party by using Bitcoin. The concept of Bitcoin relies on the theory of resolving cryptographic algorithms to produce distinct hashes with a limited quantity. This work is set out to review sentiments on Bitcoin.
My Review Findings:
The result of my analysis shows that there is a general awareness of Bitcoin all over the world.
The world is favourably disposed to Bitcoin; only about 60 tweets of the sampled 500 tweets expressed negative sentiment about Bitcoin.
About 230 tweets expressed positive sentiments, and 210 tweets were neutral.
I have also attempted to understand the degree of negativity expressed; about 80% expressed very low negativity, e.g., between -0.1 and -0.09.
The sample from the UK also showed that only about 2 tweets of the sampled 30 tweets expressed negative sentiments about Bitcoin. The degree of negativity was again low.
16 tweets expressed positive sentiments, and 12 tweets were neutral.
I attempted to understand if these sentiments were factual or subjective.
The tweets were overwhelmingly factual; only about 130 tweets were subjective worldwide.
In the United Kingdom, only nine tweets were subjective.
Given the fact that the overwhelming tweets were factual in the UK and all over the world, we can 100% take this analysis seriously.
One negative tweet in the UK came from the Brighton area around Bexhill-on-Sea. The other negative tweet came from Leicester, an area around Hinckley. The positive tweets are evenly spread all over the UK.
Among the neighbouring countries to the UK, the Republic of Ireland does not have a negative tweet about Bitcoin. France and the Netherlands have one negative comment each.
Conclusion and Policy Implications
I have been able to present the sentiment polarity and subjectivity from the United Kingdom and all over the world on Bitcoin. I have shown that Bitcoin tweets in the UK and around the world are overwhelmingly positively polarised. The sentiments expressed are overwhelmingly factual. It therefore means that these tweets can be trusted and could form the bedrock of policy implementation. I hereby suggest that:
The UK government could sponsor research on the benefits of Bitcoin and cryptocurrencies to the growth of its economy.
Bitcoin could be regarded as a digital legal tender.
Universities in the UK should be encouraged to take Bitcoin, crypto-currency, and cryptography as a course of study.
Scholarships and grants should be given to students who wish to study bitcoin and its mechanism of cryptography.
Research should be done to make Bitcoin a more secure digital currency. The negative sentiments about Bitcoin were around the fear of its security.
There is also a fear of the fact that there is no central ownership of Bitcoin. I will advise the government to set up control bodies for Bitcoin to outweigh these fears.
References¶
Crickard, P., (2018). Mastering geospatial analysis with Python: Explore GIS processing and learn to work with GeoDjango, CARTOframes and MapboxGL Jupyter. Packt Publishing. Link Gabby, A. (2023). Python Sentiment Analysis using TextBlob and VADER for Glassdoor Reviews. Medium. https://medium.com/@gabya06/python-sentiment-analysis-using-textblob-and-vader-for-glassdoor-reviews-cc9632babb73
GeoPandas Documentation (2023). Geopandas.GeoDataFrame.explore — GeoPandas Docs. Available at: https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoDataFrame.explore.html.
GeoPandas Documentation (2023). Creating a GeoDataFrame from a DataFrame with coordinates— GeoPandas Docs. Available at: Creating a GeoDataFrame from a DataFrame with coordinates — GeoPandas 0+untagged.50.g9a9f097.dirty documentation
Deparkes (2016). Folium Map Tiles - deparkes. Available at: https://deparkes.co.uk/2016/06/10/folium-map-tiles/.
Intelligent Economist, (2020). Malthusian Theory Of Population - Intelligent Economist. Available at: https://www.intelligenteconomist.com/malthusian-theory/
Muhammad, U. S. (2023). A Comparison of NLTK and TextBlob for Text Analysis. Medium. https://medium.com/@umarsmuhammed/a-comparison-of-nltk-and-textblob-for-text-analysis-bd9ebcd0ecd9
Oluyale, D. (2023, September 16). Sentiment Analysis using various options in Python Machine Learning. Medium. https://medium.com/@oluyaled/sentiment-analysis-using-various-options-in-python-machine-learning-aaa24ea0991c
re — Regular expression operations. (2023). Python Documentation. Aavailable at: https://docs.python.org/3/library/re.html Accessed: 29/11/2023 TextBlob: Simplified Text Processing — TextBlob 0.16.0 documentation. (2023). TextBlob: Simplified Text Processing — TextBlob 0.16.0 Documentation. https://textblob.readthedocs.io/en/dev/