i have following dataframe:
df= record_id time 94704 2014-03-10 07:19:19.647342 94705 2014-03-10 07:21:44.479363 94706 2014-03-10 07:21:45.479581 94707 2014-03-10 07:21:54.481588 94708 2014-03-10 07:21:55.481804 94709 2014-03-10 07:21:56.482029 94710 2014-03-10 07:21:57.482254 94711 2014-03-10 07:21:58.482473 94712 2014-03-10 07:21:59.482706 94713 2014-03-10 07:22:00.482917 94714 2014-03-10 07:22:01.483279 94715 2014-03-10 07:22:02.483545 94716 2014-03-10 07:22:03.383563 94717 2014-03-10 07:22:04.383786 94718 2014-03-10 07:22:09.485624 94719 2014-03-10 07:22:10.385118 94720 2014-03-10 07:22:11.485454 94721 2014-03-10 07:22:12.485592 94722 2014-03-10 07:22:15.486335 94723 2014-03-10 07:22:16.486475 94724 2014-03-10 07:22:17.487023 94725 2014-03-10 07:22:18.387020 94726 2014-03-10 07:22:19.387120 94727 2014-03-10 07:22:20.387379 94728 2014-03-10 07:22:22.387786 94729 2014-03-10 07:22:23.488032 94730 2014-03-10 07:22:24.388232 94731 2014-03-10 07:22:30.489594
i know how create new dataframe takes data every 60sec in order reduce size of table.
you first need set index time
column in dataframe. resample follows:
resampled = df.set_index('time').resample('1min', how='first') >>> resampled record_id time 2014-03-10 07:19:00 94704 2014-03-10 07:20:00 nan 2014-03-10 07:21:00 94705 2014-03-10 07:22:00 94713
note nan
07:20 because there no records during interval. can, of course, drop nans if desired.
>>> resampled.dropna() record_id time 2014-03-10 07:19:00 94704 2014-03-10 07:21:00 94705 2014-03-10 07:22:00 94713
Comments
Post a Comment