Python: extract rows from pandas dataframe every fixed time window -


i have following dataframe:

df=     record_id       time         94704   2014-03-10 07:19:19.647342         94705   2014-03-10 07:21:44.479363         94706   2014-03-10 07:21:45.479581         94707   2014-03-10 07:21:54.481588         94708   2014-03-10 07:21:55.481804         94709   2014-03-10 07:21:56.482029         94710   2014-03-10 07:21:57.482254         94711   2014-03-10 07:21:58.482473         94712   2014-03-10 07:21:59.482706         94713   2014-03-10 07:22:00.482917         94714   2014-03-10 07:22:01.483279         94715   2014-03-10 07:22:02.483545         94716   2014-03-10 07:22:03.383563         94717   2014-03-10 07:22:04.383786         94718   2014-03-10 07:22:09.485624         94719   2014-03-10 07:22:10.385118         94720   2014-03-10 07:22:11.485454         94721   2014-03-10 07:22:12.485592         94722   2014-03-10 07:22:15.486335         94723   2014-03-10 07:22:16.486475         94724   2014-03-10 07:22:17.487023         94725   2014-03-10 07:22:18.387020         94726   2014-03-10 07:22:19.387120         94727   2014-03-10 07:22:20.387379         94728   2014-03-10 07:22:22.387786         94729   2014-03-10 07:22:23.488032         94730   2014-03-10 07:22:24.388232         94731   2014-03-10 07:22:30.489594 

i know how create new dataframe takes data every 60sec in order reduce size of table.

you first need set index time column in dataframe. resample follows:

resampled = df.set_index('time').resample('1min', how='first') >>> resampled                      record_id time                           2014-03-10 07:19:00      94704 2014-03-10 07:20:00        nan 2014-03-10 07:21:00      94705 2014-03-10 07:22:00      94713 

note nan 07:20 because there no records during interval. can, of course, drop nans if desired.

>>> resampled.dropna()                      record_id time                           2014-03-10 07:19:00      94704 2014-03-10 07:21:00      94705 2014-03-10 07:22:00      94713 

Comments