Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: set_record + set_records #677

Open
thp44 opened this issue May 2, 2019 · 13 comments · May be fixed by #1494
Open

Feature Request: set_record + set_records #677

thp44 opened this issue May 2, 2019 · 13 comments · May be fixed by #1494

Comments

@thp44
Copy link

thp44 commented May 2, 2019

get_all_records()
return nicely formatted dictionaries (thanks!)

set_record() (new feature)
inset one dictionary to google sheet (values under right heading)

set_records() (new feature)
inset list of dictionaries to google sheet (values under right heading)

@thp44 thp44 changed the title Feature Request: Insert record Feature Request: set_record + set_records May 2, 2019
@muddi900
Copy link
Contributor

This would be simple to implement, but it would be very slow.

We can add a mapping of the columns when the get_all_records are generated to the index.

Most of the work would be validation and testing.

@alifeee
Copy link
Collaborator

alifeee commented Jul 1, 2024

The way I could see this feature working is similar to Python's CSV DictWriter class

def append_records(
  headers: List[str],
  records: List[Dict[str, str]],
  default_blank: Any = None,
  ignore_extra_dict_items: bool = False,
  # extra kwargs?
)
  # use a provided "headers" array, probably obtained with worksheet.row_values(1)
  # new_rows = []
  # for each object in records
    # for each header in headers
      # if header not in headers, raise error! (unless kwarg)
      # if header not in object, raise error! (unless default blank)
      # new_rows[i].append(object[header])
  # use worksheet.append_rows to add new_rows to worksheet

and append_record (singular) would call append_records with a wrapped record, like [record] (like append_row and appeend_rows)

Questions I would have:

  • should we provide value input option kwargs?
  • should we only offer starting from column 1? (probably easiest)

This would be simple to implement, but it would be very slow.

slow? not sure what you mean

We can add a mapping of the columns when the get_all_records are generated to the index.

also not sure what you mean

if you desire to implement it, I think this could be a nice feature @muddi900 :)

@muddi900
Copy link
Contributor

muddi900 commented Jul 1, 2024

It would be slow in the sense that the order may not be correct each time. So we have to make sure the order is correct. Or use memoization: create an list of None the size of the record, loop through the whole record each time and fill the appropriate value according column mapping.

@alifeee
Copy link
Collaborator

alifeee commented Jul 2, 2024

not sure what you mean? dictionaries do not have an order as I understand it (or should not)

So I would see the list being created with something like

default_blank = None
ignore_extra_headers = False
data = {"fruit": "Apple", "tastiness": "very tasty"}
headers = ["fruit", "tastiness"]

new_row = []
if not set(data).issubset(set(headers)) and ignore_extra_headers == False:
  raise ValueError("Extra headers found in data")
for key in headers:
  if key not in data and default_blank is None:
    raise ValueError(f"No key <{key}> found in data")
  else:
    new_row.append(default_blank)
    continue
  new_row.append(data[key])
# new_row is ["Apple", "very tasty"]

@lavigne958
Copy link
Collaborator

I agree with your code @alifeee , small detail here, remember that data is List[Dict[str,str]] as we have a dict per row.
it looks good and the slowness here is depending on the length of data, with 1M rows of course it would take some time to run. Though I don't think a sheet can hold 1M rows 🤔 (may be it does ? )

so far we need to wrap the above code in a for loop iterating on data and run that for each dict found in the list data.
it takes sometime to run if we have a lot of rows in data but nothing to long I believe.

@alifeee
Copy link
Collaborator

alifeee commented Jul 5, 2024

yes. my example, effectively, was an implementation of set_record (singular data), instead of set_records (plural data)

I think slowness will not be a problem. We will see if it is after making the code.

@lavigne958 lavigne958 added this to the 6.2.0 milestone Jul 5, 2024
@lavigne958
Copy link
Collaborator

alright then, let's put it in the next milestone.

muddi900 added a commit to muddi900/gspread that referenced this issue Jul 17, 2024
@muddi900
Copy link
Contributor

Is there an efficient way to getting the last row of a data range?

Because we would have to call get_all_values every time with set_records

@lavigne958
Copy link
Collaborator

Is there an efficient way to getting the last row of a data range?

if you already know the range yes, if you don't know the range, like getting the last row in the worksheet without knowing the height of the worksheet -> then no you don't.

Because we would have to call get_all_values every time with set_records

not necessarily, depends what we want the it to do 🙃
from what I understand:

  • the method takes a dict value
    • the keys are: the header name
    • the values are: the value to append below each header
  • we need to read the 1st row so we know which column contains which header
  • then we push the data starting from row 2.

This can be improved by adding an extra argument:

  • starting_row: the row where we should start to append the data

or this can be done using the API appendRow which allows us to pass a row of data, the API will handle the "last row" problem for us and append the row bellow: https://developers.google.com/sheets/api/reference/rest/v4/spreadsheets.values/append

this might be the easiest solution, though the extra input starting_row allow a user to choose where to put the data too.

@muddi900
Copy link
Contributor

If a person is using set_records, they are probably appending the data range, rather than the creating the first row.

I have implemented a basic solution locally.

def set_records(self, rows: List[Dict[str, Any]]) -> Response:
        cols = self.column_headers
        insert_rows = []
        for row in rows:
            insert_row = []
            for col in cols:
                insert_row.append(row[col])
            insert_rows.append(insert_row)

        url = SPREADSHEET_VALUES_APPEND_URL % (
            self.spreadsheet_id,
            f"'{self.title}'!A1",
        )
        resp = self.append_rows(
            insert_rows,
            value_input_option=ValueInputOption.user_entered,
        )

        return resp

    def set_record(self, row: Dict[str, Any]) -> None:
        self.set_records([row])

I also implemented a getter/setter for 'column_headers'.

 if client is None or not isinstance(client, HTTPClient):
            raise RuntimeError(
                """Missing HTTP Client, it must be provided with a
                valid instance of type gspread.http_client.HTTPClient .
                Please allocate new Worksheet object using method like:
                spreadsheet.get_worksheet(0)
                """
            )

        self.spreadsheet_id = spreadsheet_id
        self.client = client
        self._properties = properties

        # kept for backward compatibility - publicly available
        # do not use if possible.
        self._spreadsheet = spreadsheet
+        self._column_headers = []
+
+    @property
+    def column_headers(self) -> List[str]:
+        if not self._column_headers:
+            self._column_headers = self.row_values(1)
+        return self._column_headers
+
+    @column_headers.setter
+    def column_headers(self, value: List[str]) -> None:
+        self._column_headers = value

# ...

def get_all_records(
        self,
        head: int = 1,
        expected_headers: Optional[List[str]] = None,
        value_render_option: Optional[ValueRenderOption] = None,
        default_blank: str = "",
        numericise_ignore: Iterable[Union[str, int]] = [],
        allow_underscores_in_numeric_literals: bool = False,
        empty2zero: bool = False,
    ) -> List[Dict[str, Union[int, float, str]]]:
        """Returns a list of dictionaries, all of them having the contents of
        the spreadsheet with the head row as keys and each of these
        dictionaries holding the contents of subsequent rows of cells as
        values.

        This method uses the function :func:`gspread.utils.to_records` to build the resulting
        records. It mainly wraps around the function and handle the simplest use case
        using a header row (default = 1) and the the reste of the entire sheet.

        .. note::

           for any particular use-case, please get your dataset, your headers
           then use the function :func:`gspread.utils.to_records` to build the records.

        Cell values are numericised (strings that can be read as ints or floats
        are converted), unless specified in numericise_ignore

        :param int head: (optional) Determines which row to use as keys,
            starting from 1 following the numeration of the spreadsheet.
        :param list expected_headers: (optional) List of expected headers, they must be unique.

            .. note::

                returned dictionaries will contain all headers even if not included in this list
        :param value_render_option: (optional) Determines how values should
            be rendered in the output. See `ValueRenderOption`_ in
            the Sheets API.
        :type value_render_option: :class:`~gspread.utils.ValueRenderOption`
        :param str default_blank: (optional) Determines which value to use for
            blank cells, defaults to empty string.
        :param list numericise_ignore: (optional) List of ints of indices of
            the columns (starting at 1) to ignore numericising, special use
            of ['all'] to ignore numericising on all columns.
        :param bool allow_underscores_in_numeric_literals: (optional) Allow
            underscores in numeric literals, as introduced in PEP 515
        :param bool empty2zero: (optional) Determines whether empty cells are
            converted to zeros when numericised, defaults to False.


        Examples::

            # Sheet data:
            #      A    B    C
            #
            # 1    A1   B2   C3
            # 2    A6   B7   C8
            # 3    A11  B12  C13

            # Read all rows from the sheet
            >>> worksheet.get_all_records()
            [
                {"A1": "A6", "B2": "B7", "C3": "C8"},
                {"A1": "A11", "B2": "B12", "C3": "C13"}
            ]
        """
        entire_sheet = self.get(
            value_render_option=value_render_option,
            pad_values=True,
        )
        if entire_sheet == [[]]:
            # see test_get_all_records_with_all_values_blank
            #  we don't know the length of the sheet so we return []
            return []

        keys = entire_sheet[head - 1]
+        self.column_headers = keys
        values = entire_sheet[head:]

        if expected_headers is None:
            # all headers must be unique
            header_row_is_unique = len(keys) == len(set(keys))
            if not header_row_is_unique:
                raise GSpreadException(
                    "the header row in the worksheet is not unique, "
                    "try passing 'expected_headers' to get_all_records"
                )
        else:
            # all expected headers must be unique
            expected_headers_are_unique = len(expected_headers) == len(
                set(expected_headers)
            )
            if not expected_headers_are_unique:
                raise GSpread

I will add error handling, header validation, docs and testing, but this works.

@lavigne958
Copy link
Collaborator

I am sorry but I disagree with you, we never know how the users will use the feature, some could be interested in inserting values starting at some specific rows. We can enforce the feature to only be able to append values. that's a design choice we need to think first.

@muddi900
Copy link
Contributor

I think it would better to have two sets of methods append_record(s) and insert_record(s). This would serve all use cases as I can see them. I'll add the commits to the PR.

@lavigne958
Copy link
Collaborator

I think it would better to have two sets of methods append_record(s) and insert_record(s). This would serve all use cases as I can see them. I'll add the commits to the PR.

that's a very good idea. let's go for this option.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants