Lab4-2_BA

Berent Aldikacti

09/14/20

In [3]:
import pandas as pd
surveys_df = pd.read_csv("data/surveys.csv",
                         keep_default_na=False, na_values=[""])
surveys_df
Out[3]:
record_id month day year plot_id species_id sex hindfoot_length weight
0 1 7 16 1977 2 NL M 32.0 NaN
1 2 7 16 1977 3 NL M 33.0 NaN
2 3 7 16 1977 2 DM F 37.0 NaN
3 4 7 16 1977 7 DM M 36.0 NaN
4 5 7 16 1977 3 DM M 35.0 NaN
... ... ... ... ... ... ... ... ... ...
35544 35545 12 31 2002 15 AH NaN NaN NaN
35545 35546 12 31 2002 15 AH NaN NaN NaN
35546 35547 12 31 2002 10 RM F 15.0 14.0
35547 35548 12 31 2002 7 DO M 36.0 51.0
35548 35549 12 31 2002 5 NaN NaN NaN NaN

35549 rows × 9 columns

In [4]:
species_df = pd.read_csv("data/species.csv",
                         keep_default_na=False, na_values=[""])
species_df
Out[4]:
species_id genus species taxa
0 AB Amphispiza bilineata Bird
1 AH Ammospermophilus harrisi Rodent
2 AS Ammodramus savannarum Bird
3 BA Baiomys taylori Rodent
4 CB Campylorhynchus brunneicapillus Bird
5 CM Calamospiza melanocorys Bird
6 CQ Callipepla squamata Bird
7 CS Crotalus scutalatus Reptile
8 CT Cnemidophorus tigris Reptile
9 CU Cnemidophorus uniparens Reptile
10 CV Crotalus viridis Reptile
11 DM Dipodomys merriami Rodent
12 DO Dipodomys ordii Rodent
13 DS Dipodomys spectabilis Rodent
14 DX Dipodomys sp. Rodent
15 EO Eumeces obsoletus Reptile
16 GS Gambelia silus Reptile
17 NL Neotoma albigula Rodent
18 NX Neotoma sp. Rodent
19 OL Onychomys leucogaster Rodent
20 OT Onychomys torridus Rodent
21 OX Onychomys sp. Rodent
22 PB Chaetodipus baileyi Rodent
23 PC Pipilo chlorurus Bird
24 PE Peromyscus eremicus Rodent
25 PF Perognathus flavus Rodent
26 PG Pooecetes gramineus Bird
27 PH Perognathus hispidus Rodent
28 PI Chaetodipus intermedius Rodent
29 PL Peromyscus leucopus Rodent
30 PM Peromyscus maniculatus Rodent
31 PP Chaetodipus penicillatus Rodent
32 PU Pipilo fuscus Bird
33 PX Chaetodipus sp. Rodent
34 RF Reithrodontomys fulvescens Rodent
35 RM Reithrodontomys megalotis Rodent
36 RO Reithrodontomys montanus Rodent
37 RX Reithrodontomys sp. Rodent
38 SA Sylvilagus audubonii Rabbit
39 SB Spizella breweri Bird
40 SC Sceloporus clarki Reptile
41 SF Sigmodon fulviventer Rodent
42 SH Sigmodon hispidus Rodent
43 SO Sigmodon ochrognathus Rodent
44 SS Spermophilus spilosoma Rodent
45 ST Spermophilus tereticaudus Rodent
46 SU Sceloporus undulatus Reptile
47 SX Sigmodon sp. Rodent
48 UL Lizard sp. Reptile
49 UP Pipilo sp. Bird
50 UR Rodent sp. Rodent
51 US Sparrow sp. Bird
52 ZL Zonotrichia leucophrys Bird
53 ZM Zenaida macroura Bird

Exercise for combining dataframes

In [5]:
# Read in first 10 lines of surveys table
survey_sub = surveys_df.head(10)
# Grab the last 10 rows
survey_sub_last10 = surveys_df.tail(10)
# Reset the index values to the second dataframe appends properly
survey_sub_last10 = survey_sub_last10.reset_index(drop=True)
# drop=True option avoids adding new index column with old index values
survey_sub_last10
Out[5]:
record_id month day year plot_id species_id sex hindfoot_length weight
0 35540 12 31 2002 15 PB F 26.0 23.0
1 35541 12 31 2002 15 PB F 24.0 31.0
2 35542 12 31 2002 15 PB F 26.0 29.0
3 35543 12 31 2002 15 PB F 27.0 34.0
4 35544 12 31 2002 15 US NaN NaN NaN
5 35545 12 31 2002 15 AH NaN NaN NaN
6 35546 12 31 2002 15 AH NaN NaN NaN
7 35547 12 31 2002 10 RM F 15.0 14.0
8 35548 12 31 2002 7 DO M 36.0 51.0
9 35549 12 31 2002 5 NaN NaN NaN NaN
In [6]:
# Stack the DataFrames on top of each other
vertical_stack = pd.concat([survey_sub, survey_sub_last10], axis=0)
vertical_stack
Out[6]:
record_id month day year plot_id species_id sex hindfoot_length weight
0 1 7 16 1977 2 NL M 32.0 NaN
1 2 7 16 1977 3 NL M 33.0 NaN
2 3 7 16 1977 2 DM F 37.0 NaN
3 4 7 16 1977 7 DM M 36.0 NaN
4 5 7 16 1977 3 DM M 35.0 NaN
5 6 7 16 1977 1 PF M 14.0 NaN
6 7 7 16 1977 2 PE F NaN NaN
7 8 7 16 1977 1 DM M 37.0 NaN
8 9 7 16 1977 1 DM F 34.0 NaN
9 10 7 16 1977 6 PF F 20.0 NaN
0 35540 12 31 2002 15 PB F 26.0 23.0
1 35541 12 31 2002 15 PB F 24.0 31.0
2 35542 12 31 2002 15 PB F 26.0 29.0
3 35543 12 31 2002 15 PB F 27.0 34.0
4 35544 12 31 2002 15 US NaN NaN NaN
5 35545 12 31 2002 15 AH NaN NaN NaN
6 35546 12 31 2002 15 AH NaN NaN NaN
7 35547 12 31 2002 10 RM F 15.0 14.0
8 35548 12 31 2002 7 DO M 36.0 51.0
9 35549 12 31 2002 5 NaN NaN NaN NaN
In [7]:
# Place the DataFrames side by side
horizontal_stack = pd.concat([survey_sub, survey_sub_last10], axis=1)
horizontal_stack.head()
Out[7]:
record_id month day year plot_id species_id sex hindfoot_length weight record_id month day year plot_id species_id sex hindfoot_length weight
0 1 7 16 1977 2 NL M 32.0 NaN 35540 12 31 2002 15 PB F 26.0 23.0
1 2 7 16 1977 3 NL M 33.0 NaN 35541 12 31 2002 15 PB F 24.0 31.0
2 3 7 16 1977 2 DM F 37.0 NaN 35542 12 31 2002 15 PB F 26.0 29.0
3 4 7 16 1977 7 DM M 36.0 NaN 35543 12 31 2002 15 PB F 27.0 34.0
4 5 7 16 1977 3 DM M 35.0 NaN 35544 12 31 2002 15 US NaN NaN NaN

Challenge - Combine Data

In [8]:
df2001 = pd.read_csv('data/surveys2001.csv', keep_default_na=False, na_values=[""])
df2001.head()
Out[8]:
Unnamed: 0 record_id month day year site_id species_id sex hindfoot_length weight
0 31710 31711 1 21 2001 1 PB F 26.0 25.0
1 31711 31712 1 21 2001 1 DM M 37.0 43.0
2 31712 31713 1 21 2001 1 PB M 29.0 44.0
3 31713 31714 1 21 2001 1 DO M 34.0 53.0
4 31714 31715 1 21 2001 2 OT M 20.0 27.0
In [9]:
df2002 = pd.read_csv('data/surveys2002.csv', keep_default_na=False, na_values=[""])
df2002.head()
Out[9]:
Unnamed: 0 record_id month day year site_id species_id sex hindfoot_length weight
0 33320 33321 1 12 2002 1 DM M 38.0 44.0
1 33321 33322 1 12 2002 1 DO M 37.0 58.0
2 33322 33323 1 12 2002 1 PB M 28.0 45.0
3 33324 33325 1 12 2002 1 DO M 35.0 29.0
4 33325 33326 1 12 2002 2 OT F 20.0 26.0
In [10]:
vertical_df = pd.concat([df2001, df2002], axis=0)
vertical_df
Out[10]:
Unnamed: 0 record_id month day year site_id species_id sex hindfoot_length weight
0 31710 31711 1 21 2001 1 PB F 26.0 25.0
1 31711 31712 1 21 2001 1 DM M 37.0 43.0
2 31712 31713 1 21 2001 1 PB M 29.0 44.0
3 31713 31714 1 21 2001 1 DO M 34.0 53.0
4 31714 31715 1 21 2001 2 OT M 20.0 27.0
... ... ... ... ... ... ... ... ... ... ...
2073 35540 35541 12 31 2002 15 PB F 24.0 31.0
2074 35541 35542 12 31 2002 15 PB F 26.0 29.0
2075 35542 35543 12 31 2002 15 PB F 27.0 34.0
2076 35546 35547 12 31 2002 10 RM F 15.0 14.0
2077 35547 35548 12 31 2002 7 DO M 36.0 51.0

3512 rows × 10 columns

In [11]:
by_site_sex = vertical_df.groupby(['year','sex'])
site_sex_count = by_site_sex['weight'].mean()
site_sex_count
Out[11]:
year  sex
2001  F      36.034771
      M      36.404196
2002  F      33.878761
      M      37.490506
Name: weight, dtype: float64
In [12]:
spc = site_sex_count.unstack()
s_plot = spc.plot(kind='bar', stacked=True, title="Total weight by year and sex")
s_plot.set_ylabel("Weight")
s_plot.set_xlabel("Year")
Out[12]:
Text(0.5, 0, 'Year')
In [13]:
spc.to_csv('data/weightbyyearsex.csv', index=False)
spcread = pd.read_csv('data/weightbyyearsex.csv')
spcread
Out[13]:
F M
0 36.034771 36.404196
1 33.878761 37.490506

Challenge - Distributions

In [14]:
surveys_df = pd.read_csv("data/surveys.csv",
                         keep_default_na=False, na_values=[""])
surveys_df.head()
Out[14]:
record_id month day year plot_id species_id sex hindfoot_length weight
0 1 7 16 1977 2 NL M 32.0 NaN
1 2 7 16 1977 3 NL M 33.0 NaN
2 3 7 16 1977 2 DM F 37.0 NaN
3 4 7 16 1977 7 DM M 36.0 NaN
4 5 7 16 1977 3 DM M 35.0 NaN
In [15]:
species_df = pd.read_csv("data/species.csv",
                         keep_default_na=False, na_values=[""])
species_df.head()
Out[15]:
species_id genus species taxa
0 AB Amphispiza bilineata Bird
1 AH Ammospermophilus harrisi Rodent
2 AS Ammodramus savannarum Bird
3 BA Baiomys taylori Rodent
4 CB Campylorhynchus brunneicapillus Bird
In [16]:
merged = pd.merge(left=surveys_df, right=species_df, left_on='species_id', right_on='species_id')
merged.head()
Out[16]:
record_id month day year plot_id species_id sex hindfoot_length weight genus species taxa
0 1 7 16 1977 2 NL M 32.0 NaN Neotoma albigula Rodent
1 2 7 16 1977 3 NL M 33.0 NaN Neotoma albigula Rodent
2 22 7 17 1977 15 NL F 31.0 NaN Neotoma albigula Rodent
3 38 7 17 1977 17 NL M 33.0 NaN Neotoma albigula Rodent
4 72 8 19 1977 2 NL M 31.0 NaN Neotoma albigula Rodent
In [17]:
merged_taxaplot = merged.groupby(['plot_id'])
plot_taxa_count = merged_taxaplot['taxa'].count()
plot_taxa_count.plot(kind='bar', stacked=True, title="Taxa by plot")
Out[17]:
<AxesSubplot:title={'center':'Taxa by plot'}, xlabel='plot_id'>
In [18]:
merged_taxaplot = merged.groupby(['sex','plot_id'])
plot_taxa_count = merged_taxaplot['taxa'].count()
plot_taxa_count.plot(kind='bar', stacked=True, title="Taxa by sex and plot")
Out[18]:
<AxesSubplot:title={'center':'Taxa by sex and plot'}, xlabel='sex,plot_id'>