Lab4-2_BA¶

Berent Aldikacti¶

09/14/20¶

In [3]:

import pandas as pd
surveys_df = pd.read_csv("data/surveys.csv",
                         keep_default_na=False, na_values=[""])
surveys_df

Out[3]:

	record_id	month	day	year	plot_id	species_id	sex	hindfoot_length	weight
0	1	7	16	1977	2	NL	M	32.0	NaN
1	2	7	16	1977	3	NL	M	33.0	NaN
2	3	7	16	1977	2	DM	F	37.0	NaN
3	4	7	16	1977	7	DM	M	36.0	NaN
4	5	7	16	1977	3	DM	M	35.0	NaN
...	...	...	...	...	...	...	...	...	...
35544	35545	12	31	2002	15	AH	NaN	NaN	NaN
35545	35546	12	31	2002	15	AH	NaN	NaN	NaN
35546	35547	12	31	2002	10	RM	F	15.0	14.0
35547	35548	12	31	2002	7	DO	M	36.0	51.0
35548	35549	12	31	2002	5	NaN	NaN	NaN	NaN

35549 rows × 9 columns

In [4]:

species_df = pd.read_csv("data/species.csv",
                         keep_default_na=False, na_values=[""])
species_df

Out[4]:

	species_id	genus	species	taxa
0	AB	Amphispiza	bilineata	Bird
1	AH	Ammospermophilus	harrisi	Rodent
2	AS	Ammodramus	savannarum	Bird
3	BA	Baiomys	taylori	Rodent
4	CB	Campylorhynchus	brunneicapillus	Bird
5	CM	Calamospiza	melanocorys	Bird
6	CQ	Callipepla	squamata	Bird
7	CS	Crotalus	scutalatus	Reptile
8	CT	Cnemidophorus	tigris	Reptile
9	CU	Cnemidophorus	uniparens	Reptile
10	CV	Crotalus	viridis	Reptile
11	DM	Dipodomys	merriami	Rodent
12	DO	Dipodomys	ordii	Rodent
13	DS	Dipodomys	spectabilis	Rodent
14	DX	Dipodomys	sp.	Rodent
15	EO	Eumeces	obsoletus	Reptile
16	GS	Gambelia	silus	Reptile
17	NL	Neotoma	albigula	Rodent
18	NX	Neotoma	sp.	Rodent
19	OL	Onychomys	leucogaster	Rodent
20	OT	Onychomys	torridus	Rodent
21	OX	Onychomys	sp.	Rodent
22	PB	Chaetodipus	baileyi	Rodent
23	PC	Pipilo	chlorurus	Bird
24	PE	Peromyscus	eremicus	Rodent
25	PF	Perognathus	flavus	Rodent
26	PG	Pooecetes	gramineus	Bird
27	PH	Perognathus	hispidus	Rodent
28	PI	Chaetodipus	intermedius	Rodent
29	PL	Peromyscus	leucopus	Rodent
30	PM	Peromyscus	maniculatus	Rodent
31	PP	Chaetodipus	penicillatus	Rodent
32	PU	Pipilo	fuscus	Bird
33	PX	Chaetodipus	sp.	Rodent
34	RF	Reithrodontomys	fulvescens	Rodent
35	RM	Reithrodontomys	megalotis	Rodent
36	RO	Reithrodontomys	montanus	Rodent
37	RX	Reithrodontomys	sp.	Rodent
38	SA	Sylvilagus	audubonii	Rabbit
39	SB	Spizella	breweri	Bird
40	SC	Sceloporus	clarki	Reptile
41	SF	Sigmodon	fulviventer	Rodent
42	SH	Sigmodon	hispidus	Rodent
43	SO	Sigmodon	ochrognathus	Rodent
44	SS	Spermophilus	spilosoma	Rodent
45	ST	Spermophilus	tereticaudus	Rodent
46	SU	Sceloporus	undulatus	Reptile
47	SX	Sigmodon	sp.	Rodent
48	UL	Lizard	sp.	Reptile
49	UP	Pipilo	sp.	Bird
50	UR	Rodent	sp.	Rodent
51	US	Sparrow	sp.	Bird
52	ZL	Zonotrichia	leucophrys	Bird
53	ZM	Zenaida	macroura	Bird

Exercise for combining dataframes¶

In [5]:

# Read in first 10 lines of surveys table
survey_sub = surveys_df.head(10)
# Grab the last 10 rows
survey_sub_last10 = surveys_df.tail(10)
# Reset the index values to the second dataframe appends properly
survey_sub_last10 = survey_sub_last10.reset_index(drop=True)
# drop=True option avoids adding new index column with old index values
survey_sub_last10

Out[5]:

	record_id	month	day	year	plot_id	species_id	sex	hindfoot_length	weight
0	35540	12	31	2002	15	PB	F	26.0	23.0
1	35541	12	31	2002	15	PB	F	24.0	31.0
2	35542	12	31	2002	15	PB	F	26.0	29.0
3	35543	12	31	2002	15	PB	F	27.0	34.0
4	35544	12	31	2002	15	US	NaN	NaN	NaN
5	35545	12	31	2002	15	AH	NaN	NaN	NaN
6	35546	12	31	2002	15	AH	NaN	NaN	NaN
7	35547	12	31	2002	10	RM	F	15.0	14.0
8	35548	12	31	2002	7	DO	M	36.0	51.0
9	35549	12	31	2002	5	NaN	NaN	NaN	NaN

In [6]:

# Stack the DataFrames on top of each other
vertical_stack = pd.concat([survey_sub, survey_sub_last10], axis=0)
vertical_stack

Out[6]:

	record_id	month	day	year	plot_id	species_id	sex	hindfoot_length	weight
0	1	7	16	1977	2	NL	M	32.0	NaN
1	2	7	16	1977	3	NL	M	33.0	NaN
2	3	7	16	1977	2	DM	F	37.0	NaN
3	4	7	16	1977	7	DM	M	36.0	NaN
4	5	7	16	1977	3	DM	M	35.0	NaN
5	6	7	16	1977	1	PF	M	14.0	NaN
6	7	7	16	1977	2	PE	F	NaN	NaN
7	8	7	16	1977	1	DM	M	37.0	NaN
8	9	7	16	1977	1	DM	F	34.0	NaN
9	10	7	16	1977	6	PF	F	20.0	NaN
0	35540	12	31	2002	15	PB	F	26.0	23.0
1	35541	12	31	2002	15	PB	F	24.0	31.0
2	35542	12	31	2002	15	PB	F	26.0	29.0
3	35543	12	31	2002	15	PB	F	27.0	34.0
4	35544	12	31	2002	15	US	NaN	NaN	NaN
5	35545	12	31	2002	15	AH	NaN	NaN	NaN
6	35546	12	31	2002	15	AH	NaN	NaN	NaN
7	35547	12	31	2002	10	RM	F	15.0	14.0
8	35548	12	31	2002	7	DO	M	36.0	51.0
9	35549	12	31	2002	5	NaN	NaN	NaN	NaN

In [7]:

# Place the DataFrames side by side
horizontal_stack = pd.concat([survey_sub, survey_sub_last10], axis=1)
horizontal_stack.head()

Out[7]:

	record_id	month	day	year	plot_id	species_id	sex	hindfoot_length	weight	record_id	month	day	year	plot_id	species_id	sex	hindfoot_length	weight
0	1	7	16	1977	2	NL	M	32.0	NaN	35540	12	31	2002	15	PB	F	26.0	23.0
1	2	7	16	1977	3	NL	M	33.0	NaN	35541	12	31	2002	15	PB	F	24.0	31.0
2	3	7	16	1977	2	DM	F	37.0	NaN	35542	12	31	2002	15	PB	F	26.0	29.0
3	4	7	16	1977	7	DM	M	36.0	NaN	35543	12	31	2002	15	PB	F	27.0	34.0
4	5	7	16	1977	3	DM	M	35.0	NaN	35544	12	31	2002	15	US	NaN	NaN	NaN

Challenge - Combine Data¶

In [8]:

df2001 = pd.read_csv('data/surveys2001.csv', keep_default_na=False, na_values=[""])
df2001.head()

Out[8]:

	Unnamed: 0	record_id	month	day	year	site_id	species_id	sex	hindfoot_length	weight
0	31710	31711	1	21	2001	1	PB	F	26.0	25.0
1	31711	31712	1	21	2001	1	DM	M	37.0	43.0
2	31712	31713	1	21	2001	1	PB	M	29.0	44.0
3	31713	31714	1	21	2001	1	DO	M	34.0	53.0
4	31714	31715	1	21	2001	2	OT	M	20.0	27.0

In [9]:

df2002 = pd.read_csv('data/surveys2002.csv', keep_default_na=False, na_values=[""])
df2002.head()

Out[9]:

	Unnamed: 0	record_id	month	day	year	site_id	species_id	sex	hindfoot_length	weight
0	33320	33321	1	12	2002	1	DM	M	38.0	44.0
1	33321	33322	1	12	2002	1	DO	M	37.0	58.0
2	33322	33323	1	12	2002	1	PB	M	28.0	45.0
3	33324	33325	1	12	2002	1	DO	M	35.0	29.0
4	33325	33326	1	12	2002	2	OT	F	20.0	26.0

In [10]:

vertical_df = pd.concat([df2001, df2002], axis=0)
vertical_df

Out[10]:

	Unnamed: 0	record_id	month	day	year	site_id	species_id	sex	hindfoot_length	weight
0	31710	31711	1	21	2001	1	PB	F	26.0	25.0
1	31711	31712	1	21	2001	1	DM	M	37.0	43.0
2	31712	31713	1	21	2001	1	PB	M	29.0	44.0
3	31713	31714	1	21	2001	1	DO	M	34.0	53.0
4	31714	31715	1	21	2001	2	OT	M	20.0	27.0
...	...	...	...	...	...	...	...	...	...	...
2073	35540	35541	12	31	2002	15	PB	F	24.0	31.0
2074	35541	35542	12	31	2002	15	PB	F	26.0	29.0
2075	35542	35543	12	31	2002	15	PB	F	27.0	34.0
2076	35546	35547	12	31	2002	10	RM	F	15.0	14.0
2077	35547	35548	12	31	2002	7	DO	M	36.0	51.0

3512 rows × 10 columns

In [11]:

by_site_sex = vertical_df.groupby(['year','sex'])
site_sex_count = by_site_sex['weight'].mean()
site_sex_count

Out[11]:

year  sex
2001  F      36.034771
      M      36.404196
2002  F      33.878761
      M      37.490506
Name: weight, dtype: float64

In [12]:

spc = site_sex_count.unstack()
s_plot = spc.plot(kind='bar', stacked=True, title="Total weight by year and sex")
s_plot.set_ylabel("Weight")
s_plot.set_xlabel("Year")

Out[12]:

Text(0.5, 0, 'Year')

In [13]:

spc.to_csv('data/weightbyyearsex.csv', index=False)
spcread = pd.read_csv('data/weightbyyearsex.csv')
spcread

Out[13]:

	F	M
0	36.034771	36.404196
1	33.878761	37.490506

Challenge - Distributions¶

In [14]:

surveys_df = pd.read_csv("data/surveys.csv",
                         keep_default_na=False, na_values=[""])
surveys_df.head()

Out[14]:

	record_id	month	day	year	plot_id	species_id	sex	hindfoot_length	weight
0	1	7	16	1977	2	NL	M	32.0	NaN
1	2	7	16	1977	3	NL	M	33.0	NaN
2	3	7	16	1977	2	DM	F	37.0	NaN
3	4	7	16	1977	7	DM	M	36.0	NaN
4	5	7	16	1977	3	DM	M	35.0	NaN

In [15]:

species_df = pd.read_csv("data/species.csv",
                         keep_default_na=False, na_values=[""])
species_df.head()

Out[15]:

	species_id	genus	species	taxa
0	AB	Amphispiza	bilineata	Bird
1	AH	Ammospermophilus	harrisi	Rodent
2	AS	Ammodramus	savannarum	Bird
3	BA	Baiomys	taylori	Rodent
4	CB	Campylorhynchus	brunneicapillus	Bird

In [16]:

merged = pd.merge(left=surveys_df, right=species_df, left_on='species_id', right_on='species_id')
merged.head()

Out[16]:

	record_id	month	day	year	plot_id	species_id	sex	hindfoot_length	weight	genus	species	taxa
0	1	7	16	1977	2	NL	M	32.0	NaN	Neotoma	albigula	Rodent
1	2	7	16	1977	3	NL	M	33.0	NaN	Neotoma	albigula	Rodent
2	22	7	17	1977	15	NL	F	31.0	NaN	Neotoma	albigula	Rodent
3	38	7	17	1977	17	NL	M	33.0	NaN	Neotoma	albigula	Rodent
4	72	8	19	1977	2	NL	M	31.0	NaN	Neotoma	albigula	Rodent

In [17]:

merged_taxaplot = merged.groupby(['plot_id'])
plot_taxa_count = merged_taxaplot['taxa'].count()
plot_taxa_count.plot(kind='bar', stacked=True, title="Taxa by plot")

Out[17]:

<AxesSubplot:title={'center':'Taxa by plot'}, xlabel='plot_id'>

In [18]:

merged_taxaplot = merged.groupby(['sex','plot_id'])
plot_taxa_count = merged_taxaplot['taxa'].count()
plot_taxa_count.plot(kind='bar', stacked=True, title="Taxa by sex and plot")

Out[18]:

<AxesSubplot:title={'center':'Taxa by sex and plot'}, xlabel='sex,plot_id'>